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1.0   INTRODUCTION 

This  report  describes  a  stochastic  parametric  sensitivity  analysis  of 
a  detailed  structural  model  of  the  U.S.  economic  system.   Model  parameters 
defining  the  structure  of  the  system  at  the  time  of  measurement  were  derived 
from  physical  observations  of  the  system.   Use  of  such  models  is  becoming 
increasingly  prevalent  for  mid-to  long-range  studies  and  policy  analyses  in 
government  planning  at  all  levels.   Resource  scarcity,  foreign  policy  con- 
tingencies and  other  factors  have  made  rapid  structural  change  the  object  of 
analysis,  not  something  one  can  assume  avay.   Effective  use  of  such  models 
requires  an  understanding  of  the  effects  of  parametric  change  and  uncertain- 
ty. 

We  are  concerned  here  with  a  linear  static  input-output  model  of  the 
U.S.  economy.   Its  parameters  are  derived  from  data  on  interindustry  trans- 
actions complied  by  the  U.S.  Department  of  Commerce.    Due  to  the 
size  and  complexity  of  the  economic  system,  funding  limitations  and  measure- 
ment lags,  these  parameters  are  seven  years  out  of  date  when  published. 
Parametric  uncertainty  therefore  can  arise  from  two  sources:   observation 
of  the  system  during  the  base  year  and  structural  changes  during  the  seven 
year  lag  period.   Estimates  of  uncertainty  in  the  base  year  parameters  were 
compiled  by  Bullard  (1976)  and  are  discussed  in  Appendix  A. 

The  effect  of  parametric  uncertainty  on  model  outputs  has  been  dis- 
cussed by  Sebald  (197M  and  Bullard  and  Sebald  (1975).   These  papers 
quantified  the  maximum  error  tolerances  that  would  result  from  the  worst- 
case  distribution  of  parametric  errors.   For  this  model,  it  was  found  that 
the  process  of  matrix  inversion  could  magnify  input  errors  by  more  than 


emphasizing  the  need  for  developing  a  methodology  that  could 
quantify  the  extent  to  which  parametric  errors  cancel  one  another. 

The  Monte  Carlo  simulation  analysis  described  here  was  designed  to 
answer  that  question.   Base  year  interindustry  transactions  were  character- 
ized as  random  variables  ana  the  model  parameters  were  derived  from  them. 
The  results  from  each  simulation  were  used  to  update  a  set  of  sufficient 
statistics  to  yield  unbiased  estimates  of  means,  variances  and  some  covari- 
ances.   The  simulations  were  performed  to  evaluate  both  the  effect  of  doubling 
error  tolerances  on  inputs  and  the  effect  of  changing  the  structure  of  the 
model  to  enhance  its  usefulness  for  predictive  work. 

After  200  simulations,  the  preliminary  results  were  analyzed  in  order 
to  determine  the  cost-effectiveness  of  proceeding  with  more  simulations. 
In  all  cases,  this  a  priori  determination  of  the  confidence  intervals  on 
final  results  showed  that  1000  runs  would  be  adequate.   These  estimates  were 
then  verified  when  the  simulation  had  been  completed. 

Chapter  2  describes  the  preparation  of  the  data  base  and  estimation  of 
uncertainty  on  base  year  transactions.   Chapter  3  details  the  simulation 
methodology,  the  criteria  for  determining  "acceptability"  of  simulated 
parameters  and  derivation  of  the  stopping  rule.   Chapter  h   presents  the 
results  of  all  simulations,  and  discusses  the  effects  of  aggregation, 
magnitude  of  input  uncertainty  and  other  <.  .riables. 


2.0  DATA  BASE  PREPARATION 


2.1  The  Model 


The  linear  static  input-output  model  of  the  U.  S.  economic  system  is 
described  in  detail  by  Bullard  and  Herendeen  (1975).   It  is  based  on  the 
theory  developed  by  Leontief  (19^1),  and  relies  largely  on  data  assembled  by 
the  U.  S.  Department  of  Commerce,  Bureau  of  Economic  Analysis  (BEA).   Data 
are  expressed  in  constant  dollars,  'which  act  as  a  surrogate  for  physical 
units.   In  this  particular  model  however,  the  inputs  of  energy  to  all  sectors 
are  expressed  in  physical  units,  to  take  account  of  the  fact  that  energy  is 
sold  to  different  users  at  different  prices. 

The  governing  equation  of  the  model  is 

(I-A)  X  =  Y  (2.1-1) 

where  X  is  an  N-order  vector  of  gross  domestic  outputs  for  each  sector,  Y  is 
the  vector  of  total  final  demands  for  the  output  of  each  sector,  and  A  is  the 
matrix  of  parameters  describing  the  technology  of  producing  goods  and  services 
during  the  base  year.   A  typical  element  A. .  represents  the  amount  of  input 
from  sector  i  required  directly  by  sector  j  to  produce  one  unit  of  its  out- 
put.  These  parameters  are  derived  from  base  year  observations  of  interindustry 

transactions,  T..,  (amount  of  output  from  sector  i  sold  directly  to  sector  j): 
■*-  J 

T.  . 
A   E  -iJ-  (2.1-2) 

In  turn,  these  interindustry  transactions  are  defined  as  the  sum 

T  =  DA  +  MDT  +  TF  (2.1-3) 


where  DA   is  the  amount  of  product  i  sold  directly  to  sector  J,  MDT. 
1 J  ij 

represents  the  transportation  or  trade  margin  i  on  all  inputs  to  sector  j  , 

and  TF. .  represents  the  amount  of  product  j  produced  as  a  secondary  output 
-'-J 

by  sector  i. 

2.2  The  Data 

Estimates  of  all  elements  of  the  above  matrices  are  collected  and 
assembled  by  BEA  at  the  kQk   sector  level  of  detail.   Before  publication 
however,  they  are  aggregated  to  about  360  sectors.   BEA  personnel  respon- 
sible for  this  compilation  were  interviewed;  their  subjective  estimates  of 
uncertainty  on  all  base  year  transactions  are  given  in  Appendix  A. 

Before  proceeding  with  the  Monte  Carlo  simulation,  these  data  were 
aggregated  to  90  and  then  to  30  sectors.    The  30  sector  data  base  was  used 
for  the  development  and  verification  of  all  the  computer  programs.   The  90 
sector  data  base  was  used  for  the  main  simulation.   This  degree  of  aggregation 
was  chosen  for  economic  reasons  (matrix  inversion  is  an  expensive  N 
operation)   and  because  it  corresponds  most  closely  to  the  most  widely  dis- 
tributed and  used  version  of  the  BEA  input-output  tables.   These  are  published 
at  the  83  sector  level  of  detail,  while  the  90  sector  model  used  here  retains 
more  detail  in  the  transportation  and  energy  sectors  of  the  economy. 

Aggregating  an  input-output  data  base  is  a  nontrivial  operation  since 
it  must  be  done  prior  to  the  operations  in  (2.1-3).   After  aggregating  the 


Names  of  sectors  at  each  level  of  aggregation  are  given  in  Appendix  B. 


three  matrices  independently  and  summing  to  obtain  T,  X  and  A  are  computed 
using  eqs.  (2.1-1)  and  (2.1-2). 

A  101  sector  data  base  was  also  constructed  by  replacing  the  5  energy 
sectors  in  the  90  order  model  by  16  energy . supply  and  service  sectors. 
The  rationale  and  development  of  the  101  sector  model  are  described  by 
Bullard  and  Sebald  (1975).   Its  purpose  is  to  more  accurately  predict  energy 
consumption  in  future  years  by  explicitly  modeling  fuel  substitutability. 
In  effect,  this  model  recognizes  that  end  uses  of  energy  (space  heating, 
lighting,  air  conditioning,  etc.)  are  less  substitutable  than  the  fuels 
themselves  by  permitting  the  non-energy  sectors  to  purchase  only  end  uses 
of  energy,  while  fuels  are  sold  only  to  the  end  use  sectors.   Note  that 
the  former  are  very  stable  over  time  while  the  latter  are  not.   The  most 
variable  coefficients  or  parameters  involved  in  energy  consumption  are 
thereby  confined  to  the  few  representing  sales  of  fuels  to  the  end  use 
sectors,  which  can  be  estimated  independently  using  models  designed 
expressly  for  that  purpose. 


*  Names  of  sectors  at  each  level  of  aggregation  are  given  in  Appendix  B, 


3.0  METHODOLOGY 

3.1  Point  of  Viev  for  Stochastic  Error  Analysis 

There  are  several  ways  to  interpret  this  problem,  and  the  point  of  view 
affects  both  the  methodology  and  the  interpretation  of  results.  One  way  is 
to  act  as  a  simulator  of  BEA's  activities  from  data  collection  through  matrix 
inversion.   In  an  alternative  viewpoint,  the  analyst  attempts  an  a  priori  de- 
termination of  the  effect  of  mathematical  transformations  on  uncertain  obser- 
vations.  In  either  case,  this  information  enables  the  analyst  to  assess  the 
usefulness  of  the  data  for  modeling  purposes.  We  have  adopted  the  latter  point 
of  view. 

Within  this  framework,  the  analyst  receives  signals  from  the  economic 
system  associated  with  each  interindustry  transaction  as  well  as  total  output 
and  value  added.  Actually,  each  of  these  signals  from  an  industry  is  the  sum 
of  many  signals  from  individual  establishments.  The  signals  appear  to  be  in- 
dependent; that  is  to  say,  the  signals  tell  us  little  about  their  correlation.* 
The  analyst's  only  information  on  these  correlations  comes  from  accounting 
identities  requiring  income  to  equal  outgo. 

Each  signal  is  characterized  by  BEA  in  terms  of  upper  and  lower  bounds  and 
a  "published  value"  representing  their  estimates  of  where  the  true  value  is  mos 
likely  to  lie.  We  then  characterize  BEA's  knowledge  of  the  transactions  as  ran 
dom  variables.  The  distributions  are  inputs  for  the  Monte  Carlo  analysis  which 
transforms  them  into  a  set  of  numbers  comprising  the  solution  set.  Each  elemen 
of  the  solution  set  is  characterized  by  a  set  of  statistics**  which  are  then 
compared  with  the  deterministic  result. 


*  Due  to  the  size  and  complexity  of  the  economic  system,  frequent  measurement 

is  economically  prohibitive  so  no  information  is  available  from  time  series 
analysis. 

**  Mean,  variance  6 


Each  input  variable  is  first  sampled  independently,  but  some  effort  is 
then  made  to  assure  that  the  external  balance  conditions  are  satisfied.   This 
is  what  BEA  does  in  their  deterministic  approach,  and  we  make  a  similar 
attempt  with  our  Monte  Carlo  approach.   It  is  unrealistic  however,  to 
completely  simulate  BEA's  activities,  many  of  which  are  judgemental,  undocu- 
mented, and  not  reproducible.  The  specific  shortcuts  taken  are  detailed  in 
section  3.^. 

3.2  Sampling  Random  Variables 

All  of  the  basic  data  (transactions,  industry  output,  final  demands) 
are  characterized  as  random  variables  having  either  normal  or  lognormal 
distributions.*  As  discussed  in  Appendix  A,  Section  kt   entries  which 

have  been  truncated  to  zero  by  BEA  are  modeled  with  a  "folded  normal"  ran- 
dom variable,  which  is  simply  the  absolute  value  of  a  normal  random  variable 
with  mean  0.   Non-zero  cells  are  modeled  using  either  normal . or  lognormal 
random  variables  with  the  former  used  in  those  cases  where  the  published 
value  is  relatively  accurate.   In  situations  where  the  data  is  less  well 
known,  an  analyst  will  tend  to  use  a  multiplicative  factor  to  bound  his 
estimate  rather  than  an  additive  error  bound.   A  lognormal  distribution  is 
appropriate  in  such  a  case  because  of  its  property  of  multiplicative  sym- 
metry about  the  median.   That  is,  if  X_is  the  median  of  a  lognormal  random 
variable  X,  then  Prob.  (X  >  X  D)  =  Prob^  (X  <  X  /D)  for  any  factor  D.   For 


*  In  a  few  cases  a  negative  entry  in  the  data  is  modeled  by  the  negative  of 
a  lognormal  random  variable  (which  necessarily  takes  only  positive  values). 
This  set  of  circumstances  is  handled  so  much  like  the  usual  lognormal  case 
that  it  is  not  discussed  separately  in  what  follows. 


example,  if  an  analyst  states  that  his  estimate  has  probability  a  of  being 
correct  within  a  factor  of  D,  then  a  lognormal  random  variable  with  a  = 
Prob.  (Xfl/D  <  X  <_  X  D)  will  be  used  to  model  the  situation. 

This  section  outlines  a  procedure  for  sampling  from  random  variables 
such  that 

1)  The  sample  will  be  drawn  from  a  folded  normal,  normal  or  lognormal 
population. 

2)  The  distributions  will  be  truncated  to  prevent  samples  that  are 
absurd  (e.g.,  negative  transactions).   Truncation  eliminates 
samples  in  the  upper  and  lower  0.15$  tails  in  the  normal  and  log- 
normal  cases  and  in  the  upper  0.3$  tail  in  the  folded  normal  case. 
This  corresponds  to  the  percentage  of  probability  outside  3 
standard  deviations  from  the  mean  in  a  normal  population. 

3)  The  expected  value  of  the  sampled  result  is  equal  to  the  published 
value,  M,  of  the  entry  in  question  (except  in  the  folded  normal 
case  where  the  published  value  is  zero). 

k)      Before  truncation,  the  random  variable  X  from  which  we  sample 
has  a  confidence  interval  defined  by  a  parameter  b,  5  or  D. 

a.  Folded  Normal  Case 

Prob  (X  <  b)   =   .997 

(i.e.,   b   amounts  to   3  standard  deviations   of  the  underlying 

normal  random  variable. 

b.  Normal  Case 

Prob   (p  -   6^x-  X  -yJC+   6V   =    '"7 

(i.e.,    o   amounts  to   2>  standard  deviations   of  X  expressed  as   a 

fraction  of  the  mean,  My=  M) 

c.  Lognormal  Case 

Prob  (XQ/D  <  X  <  XQD)  =  .997 

In  all  three  cases  the  sampling  procedure  is  based  on  a  standard  normal 
random  variable   (i.e.  ,  mean  =  0  and  variance  =  1,  denoted  N(0,l)). 


*  The  standard  normal  random  number  generator  used  was  the  International 
Mathematical  Statistical  Library  routine  GGNRF.   Tests  of  randomness  and 
normality  were  performed  for  verification  purposes  and  are  described  in 
Appendix  D. 


Truncation  is  achieved  by  sampling  until  a  value  r  is  obtained  which  is 
less  than  3  in  absolute  value.   In  the  folded  normal  case  we  set  y  =  0  and 
a  =  b/3  so  that  y  +  or   is  a  sample  from  a  truncated  N(y,  a    )   variable;  the 

absolute  value  then  satisfies  the  conditions  for  the  folded  normal  sample. 

6M 
In  the  normal  case  we  set  y  =  M,  a  =  —   and  then  y  +  ra  is  used  as  the 

normal  sample. 

The  situation  for  X  lognormal  is  slightly  more  complicated.   In  this 

Y 
case,  X  =  e  where  Y  is  a  normal  random  variable.   Let  y  and  a   be  the  mean 

and  standard  deviation  of  Y. 

Then  the  median  X  of  X  is  equal  to  e  so  y  =  In  X  .   Therefore, 

Prob  (XQ/D  <  X  <_  XQD)  =  .997 
implies  that 

Prob  (lnX  -  InD  <_  Y  <_  lnXQ  +  InD)  = 

Prob  (-InD  <_Y  -  y  <_  InD)  =  .997 
so  that 

InD  =  3a. 

Since  we  want  the  mean  value  to  equal  the  published  value, 

2 

y+  7T  .  „   a    n  ,,   In  D 

y  =  e   2  =  M,  we  must  set  y  =  InM  -  —  =  InM  -    n 

To  summarize,  we  sample  for  X  in  the  lognormal  case  by  obtaining  a 

truncated  standard  normal  random  number, r,  and  setting  X  =  e     where 

2 
a  =   InD  and  y  =  InM  -    n  .   Comparing  the  three  cases  we  have; 

X  Folded  Normal  X  Normal  X  Lognormal 


y  =  0 
a  =  b/3 


y  =  M 
a  =  6M/3 


y  =  InM  - 
a  =  lnD/3 


2 
In  D 

18 


X  =  truncated  ABS(N(y,a  ))     X  =  truncated  N(y,o  ) 


X  =  truncated  e 


N(y,a' 


In  the  lognormal  case  the  mean  is  not  coincident  with  the  median.   To 
evaluate  the  error  resulting  from  assuming  they  are  equal,  suppose  an  analyst 
gives  a  confidence  interval  for  the  true  value  T,  in  terms  of  his  estimate 
M  and  a  factor  D.   That  is, 

Prob  (M/D  <  T  <_  MD)  =  .997 
where  .997  is  just  the  probability  spanned  by  three  standard  deviations 
about  the  mean  in  a  normal  distribution. 

We  have  modeled  this  situation  with  a  random  variable  X  with 
u  =  M  and 

Prob  (X  /D  <_  X  <  DXQ)  =  .997 
We  want  to  show  that 

Prob  (M/D  <  X  1  DM)  is  close  to  .997- 
In  fact, 

Prob  (M/D  <   X  <  DM)  = 

Prob  (inM  -  InD  <  Y  <  InD  +  InM)  = 

2  2 

Prob  (y+|-  -3a<Y<3c+y+|-)= 

Prob  (§-  3  <_^<  3  +  £) 
d.  a       2 

Y-u 

Since  is  standard  normal,  we  can  find  this  probability  in  standard 


normal  tables   if  we  know  a.   For  a  typical  value  of  D  such  as 
2 


o   o   _  InD 
D  =  8,  -  =  -T~-  =  .35 


Therefore, 

Prob  (M/D  <_  X  <  DM)  =  Prob  ( .  35  -  3  <  —  <   3  +  .35)  =  -995  , 

a   _' 

indicating  that  the  error  resulting  from  the  assumption  is  negligible 


.10 


3.3  Aggregating  Random  Variables 

Based  on  subjective  uncertainty  estimates  made  by  BEA  Personnel, 
probability  distributions  were  defined  at  the  360  sector  level  of  detail. 
Since  simulations  were  done  at  the  101,  90  and  30  sector  levels,  aggregation 
was  necessary.   The  means  of  the  aggregated  variables  are  easily  obtained 
but  specification  of  the  distributions  of  the  aggregate  variables  is  a  non- 
trivial  task  which  was  undertaken  in  the  following  way.   Since  all  trans- 
actions, margins,  etc.  at  the  368  order  are  in  fact  aggregates  of  data 
obtained  initially  from  individual  establishments  grouped  by  5  or  6  digit 
Standard  Industrial  Classification  codes,  the  specification  of  a  distribution 
for  these  aggregates  was  a  crude  assumption  in  itself.   The  basis  for 
specifying  the  distribution  at  the  90  sector  level  is  equally  subjective, 
so  we  adopt  the  following  convention.   Assume  that  the  variance,  V,  of 
each  aggregated  element  is  the  sum  of  the  variances  of  all  its  constituents.  If 
3/"v  is  less  than  hQ%   of  the  aggregated  mean,  y,  assign  a  normal  N('y,V) 
distribution  to  the  variable.   If  3"V  is  greater  than  k0%   of  y ,  a  lognormal 
distribution  is  assumed.   If  y  equals  2iero,  a  folded  normal  distribution  is 
used.   This  rule  is  simply  a  formalized  reproducible  characterization  of  a 
subjective  assessment  of  input  data  uncertainty.   It  is  felt  that  the  sub- 
jective nature  of  the  disaggregated  uncertainty  estimates  did  not  warrant  a 
more  rigorous  approach.   For  purposes  of  reproducibility,  however,  the 
adopted  algorithm  is  detailed  below. 

The  first  step  in  aggregating  is  to  compute  the  variances  of  the  entries 
being  aggregated: 
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Case  1:      Folded  Normal.  V  =    (b/3)2    (l-  -) 


TT 


Case  2:   Normal.  V  =  (M  6/3)2 

*  2      1  2D 

Case  3:   Lognormal.  V  =  M  exp(  — —  )  -  1  ! 


As  indicated  above,  the  variances  of  the  entries  being  aggregated  are 
summed  to  obtain  the  variance  of  the  aggregate  entry.   The  decision  to  make 
the  aggregate  entry  folded  normal,  normal  or  lognormal  depends  on  the  ag- 
gregate variance  V  and  the  aggregate  published  value  M. 

Case  1:   M  =  0.   Here  we  assume  that  the  constituent  entries  were 

also  published  zeros.  The  parameter  b  is  chosen  so  that  the 
variance  of  the  resulting  folded  normal  will  equal  the  given 
aggregate  variance  V.   This  value  is    b  =  /  V/(l  -   2/tt) 

Case  2:   M  4   0  and  3  /  V/ IM1  <_  .k.      These  entries  are  modeled  as  normal 
with  6  chosen  so  that  the  resulting  variance  will  equal  V. 
This  value  for  6  is  6  =  3/"v?!M| 


Case  3:   M  ^  0  and  3  /  V/ [Ml  >  .  U.   Here  we  use  a  lognormal  random 

variable  specified  by  the  parameter  D  chosen  to  be  consistent 
withv.   D  =  exp(3  /  ln(l  +  V/M^))* 

3.3  Constructing  the  Transactions  Matrix 

Fig.  3-1  shows  graphically  the  relationship  between  the  matrices  of 

transactions,  (T),  final  demand  (FD),  imports  (M)  and  gross  domestic  outputs 

(GDO). 

N  10 

T  T.  +  J   FD.n  -  M.  =  GDO.                  (3.3-1) 

j=l  1J  k=l   lk    X 

These  random  variables  are  sampled  from  normal  or  lognormal  distributions  as 
described  above.   Each  element  in  the  first  row  (i=l)  is  sampled  first 
independently,  just  as  BEA  analysts  receive  these  values  from  apparently   in- 
dependent sources.   Since  eq.  (3.3-1)  is  an  external  balance  condition  that 


*See  appendix  C  for  details. 
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is  not  satisfied  in  general,  we  force  this  condition  to  be  satisfied  in 

much  the  same  manner  as  BEA  does.   The  lognormally  distributed  variables  in 
the  row  are  generally  those  obtained  from  unreliable  sources  or  computed 

using  surrogate  variables.   Therefore  these  values  are  scaled  proportionately 

* 

to  satisfy  eq.  (3.3-1). 

Proceeding  in  this  manner  through  N  rows,  a  complete  data  set  is  con- 
structed satisfying  row  constraints.   The  rows  are  not  independent,  however, 
because  the  value  of  all  outputs  (GDO)  of  a  sector  must  equal  the  value  of 
all  commodity  inputs  (from  the  other  N  sectors)  plus  "value  added"  (a  term, 
VA,  accounting  for  wages,  taxes,  and  profit).   VA  is  measured  independently 
by  federal  agencies  and  provides  BEA  analysts  with  another  external 

condition  to  satisfy.   Their  method  for  satisfying  this  was  too  complex  to 

»* 

model,  so  a  simpler  check  had  to  be  devised  for  this  Monte  Carlo  study. 

The  method  employed  is  based  on  the  response  of  the  BEA's  director 

of  the  1-0  study  to  the  following  question:   "If  the  criterion  for  terminating 
the  iterative  process  of  balancing  the  1-0  table  were  based  on  uncertainty  of 
the  VA  values,  how  much  could  be  tolerated?"  The  answer  indicated  that  out 
of  90  sectors,  at  least  88  must  be  within  ±20$  of  the  "true"  value.***  If 
the  condition  was  not  met,  the  matrix  was  rejected.   This  condition  was  never 
violated  in  the  actual  simulation. 


*   In  fact,  BEA  analysts  actually  estimate  many  of  these  uncertain  values  by 
computing  the  difference  between  GDO  and  the  sum  of  the  well  known  (normall; 
distributed)  variables  and  allocating  proportional  to  some  surrogate 
variables  (e.g.  employment). 

**  In  the  1967  input-output  study,  consistency  between  row  and  column  sums 

was  assured  by  assigning  responsibility  for  individual  sectors  to  different 
analysts  and  after  each  independently  estimated  initial  row  values,  the 
resulting  columns  were  presented  to  each  analyst  for  independent  verifi- 
cation.  After  many  iterations  and  some  undocumented  Judgement  decisions, 
the  "published"  values  were  agreed  upon. 

***  Philip  M.  Ritz  (1976)  Interindustry  Economics  Branch,  Bureau  of  Economic 
Analysis,  U.  S.  Department  of  Commerce,  personal  communication. 

Ik 


Next,  the  terms  in  eq.  (3.3-1)  are  used  to  compute  the  coefficients 

T.  . 

A    =  -±d- 
ij    GDOj 


and  the  Leontief  inverse  matrix  (i-A)  "  is  finally  calculated.   Aside  from 
checking  the  eigenvalues  of  A,  there  is  no  a  priori  check  that  can  he  performed 
to  guarantee  positivity  of  the  inverse  matrix.    Therefore,  each  inverse  matrix 
is  checked  after  it  is  computed  to  verify  that  every  element  is  greater  than 
zero.   If  it  fails  the  test,  all  the  randomly  selected  variables  T,  FD,  M, 
GDO  are  discarded  and  a  new  set  is  selected.   This  is  exactly  the  procedure 
employed  by  BEA.   Again,  the  simulation  was  completed  without  this  condition 
being  violated. 
3.^  Results  Saved  for  Analysis 

The  simulation  described  here  is  expensive  from  a  computational  point 

3 
of  view  since  matrix  inversion  is  an  N  operation.   For  this  reason,  every 

simulated  Leontief  inverse  matrix  was  saved  on  tape  so  it  would  be  available 

** 

for  future  analysis  if  necessary. 

For  purposes  of  this  analysis,  our  attention  was  focused  on  the  means, 

variances  and  confidence  intervals  for  the  elements  of  (i-A)   and  selected 
subsets  and  linear  combinations  thereof.   To  calculate  these,  it  was  necessary 
to  save  a  set  of  sufficient  statistics  on  disk  after  each  iteration,  the 

running  sum  and  the  sum  of  the  squares  for  each  element  of  the  following  set 
of  results  which  we  shall  denote  by  fi: 


*  If  all  variables  were  expressed  in  current-year  dollars,  some  a  priori 
tests  are  available.   In  the  general  case  such  as  thin  one,  where  the 
energy  sector  outputs  are  uxprt-ujjtnJ  in  [Aiy  \:  i  caJ  unit.;:,  no  awAi    tests  exist 

*  *  The  tape  will  be  delivered  to  EPRI  under  separate  cover. 
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1.  The  entire  (i-A)   matrix; 

2.  The  total  primary  energy  intensity  vector,  e;  and 

3.  The  sector  output  vector,  X. 

The  total  primary  energy  intensity  vector  is  a  linear  combination  of  the 

energy  rows  of  (i-A)   ,  and  a  typical  element  e.  represents  the  amount  of  basic 

J 

energy  resources  required  directly  and  indirectly  to  produce  one  unit  of  output 
from  sector  j  for  final  consumption.    The  sector  outputs  X  are  computed  from 
the  simulated  (i-A)  '  matrix  using  the  base  year  domestic  final  demands  as 
weighting  factors: 

1    10 
X.  =  I    (I-A) 7.   (  I      FD.  .  -  M.)  (3.U-1) 

This  is  done  because  1-0  models  are  frequently  employed  to  estimate  total  sectc 
outputs  corresponding  to  a  specified  final  bill  of  goods,  and  a  significant 
amount  of  additional  error  cancellation  may  be  achieved. 

In  order  to  ascertain  the  nature  of  the  distribution  of  typical  random 
variables,  each  simulated  value  was  saved  for  source  results.   The 
variables  saved  were  X,  e,  and  the  electricity  sector  row  of  (i-A)   .   Goodness 
of  fit  tests  performed  on  these  variables  are  described  in  Section  h. 

Finally,  since  most  applications  of  the  particular  models  examined  are  in 
the  area  of  energy  policy  analysis,  it  was  decided  to  save  sufficient  stat- 


*  The  energy  rows  utilized  are  those  corresponding  to  coal,  crude  oil  and  gas, 
and  the  fossil  fuel  equivalent  of  hydro  and  nuclear  electricity:   e.  =  (i-A). 

+  (I-A)"1  +  0.6  (I-A)"1!*. 
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istics  for  recovering  covariances  of  the  energy  sector  rows  of  (l-A)~  .   Since 
all  possible  linear  combinations  were  ~ot  o^  interest  -  only  row  and  column 
combinations  -  storage  requirements  were  considerably  reduced.   It  was  sufficient 
to  save  the  running  sums  of  products  of  all  pairs  of  entries  appearing  together 
in  such  linear  combinations.   If  other  combinations  are  ever  needed,  they  will 
be  recoverable  from  (i-A)   matrices  saved  on  an  archive  tape  as  described 
earlier. 

With  this  set  of  results  it  is  possible  to  estimate  the  total  energy  require- 
ments to  meet  arbitrarily  specified  final  demands,  and  to  compute  linear 
combinations  of  energy  intensities  similar  to  the  "total  primary"  one  described 
earlier. 

3.5  Stopping  Rule 

One  of  the  major  difficulties  associated  with  Monte  Carlo  simulation  is 
knowing  how  many  runs  will  be  required  to  attain  reasonable  confidence  in- 
tervals on  the  results  of  the  simulation.   There  are  two  major  problem  areas. 
If  one  is  considering  whether  or  not  to  use  Monte  Carlo  techniques,  an 
estimate  of  the  required  number  of  runs  is  crucial  to  determination  of 
simulation  costs.   It  may  be,  for  example,  that  reasonable  confidence  in- 
tervals may  require  a  prohibitively  expensive  number  of  runs.   The  second 
problem  arises  after  the  decision  has  been  made  to  use  Monte  Carlo  methods. 
One  needs  to  know  when  enough  runs  have  been  made. 

In  the  first  problem  area,  present  practice  dictates  running  several 
small  scale  simulations  of  a  similar  nature  to  the  one  of  interest  in  order 
to  be  able  to  extrapolate  the  number  of  runs  in  the  smaller  cases  to  the 
probable  runs  needed  in  the  larger.   In  the  second  area,  good  statistical 


IT 


practice  dictates  that  before  taking  any  samples,  one  must  determine  how  to 
stop  sampling  in  a  way  that  doesn't  bias  results.   Executing  additional  runs 
if  the  resulting  confidence  intervals  are  too  large  is  considered  unwise 
since  one  runs  the  risk  of  biasing  the  simulation  results  by  stopping  when 
the  desired  outcome  occurs. 

In  this  section  we  present  a  method  for  determining,  based  on  a  very 
small  number  of  runs,  the  proper  number  of  total  runs  the  simulation  should 
require.   The  method  properly  elucidates  the  cost/benefit  tradeoff  between 
the  cost  of  additional  runs  and  the  benefits  of  increased  accuracy.   Informa- 
tion is  displayed  to  the  analyst  in  a  way  that  facilitates  his  making  a 
judgement  on  the  proper  number  of  runs  to  be  made.   Since  this  method 
is  based  on  just  the  first  few  runs,  biasing  of  the  simulation  is  not  a 
problem.   Based  on  a  very  small  number  of  runs,  it  is  also  a  cost  effective 
way  to  decide  whether  a  Monte  Carlo  analysis  is  economically  feasible. 

In  section  3.5.1  we  outline  the  approach  used,  in  section  3.5.2  a 
brief  sketch  of  the  mathematics  involved  is  given,  followed  by  an  example 
in  section  3.5-3.   Mathematical  derivations  are  given  in  section  3.5.^. 

3.5.1  An  Outline  of  the  Approach 

Suppose  a  relatively  small  number  of  simulation  runs  have  been  made  and 
unbiased  estimates  of  the  second  order  statistics  of  all  elements  of  a  set 
of  results,  ft,  have  been  calculated.   Since  the  estimates  are  themselves 
random  quantities,  one  can  determine  an  interval  about  each  estimate  which 
contains  the  population  value  (e.g.  mean  or  variance)  with  a  certain  probabilit 
These  intervals  are  called  confidence  intervals  (Cl)  and  we  shall  interest 
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ourselves  in  intervals  for  which  the  corresponding  probability  is  .95. 
Fig.  3.5-1  describes  the  situation  within  which  one  must  interpret  the 
results  of  a  simulation.   Generally,  then,  the  simulation  output  is  given  in 
two  ways:   the  unbiased  estimates  are  tabulated  and  their  confidence  intervals 
(e.g.  95%)    are  also  given.   Our  strategy  will  be  to  make  a  few  runs  and  then 
based  on  the  resulting  estimates  and  CI's,  determine  how  many  runs  the  entire 
simulation  will  require.   Two  major  effects  occur  with  increasing  sample  size. 
First,  the  estimates  y  and  a  will  move  around,  ultimately  converging  to  the 
correct  values.   Second,  the  width  of  the  CI's  will  decrease  monotonically  to 
zero  as  the  number  of  runs  goes  to  infinity.   For  purposes  of  the  stopping 
rule,  we  have  chosen  to  quantify  the  resulting  simulation  accuracy  by  monitor- 
ing  a  histogram  of 

T3+4  3o  +  U 

■d - — 

y 

for  each  stopping  point  considered.   The  algorithm  therefore: 

1)  Draws  a  histogram  of  the  actual  B  values  after  an  initial  number  of 
runs ,  m  . 

2)  On  the  basi|  of  the  information  after  m   runs,  draws  a  histogram  of 
predicted  B  values  after  ^2  runs, where  m2>  m  denotes  a  possible 
stopping  point  for  the  full  blown  simulation. 

3)  Step  2  for  various  m  . 

Typical  results  are  displayed  in  fig.  3.5-2. 

3.5.2  A  Mathematical  Overview 

As  before,  we  denote  the  matrix  of  simulation  output  variables  by  ft.     Each 
X  e   ft   has  an  unknown  distribution  which  is  at  best  only  approximately  normal. 


The  symbols  are  defined  in  Figure  3. 5-l. 
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Figure  3.5-1   Mean  and  Variance  Estimates  and  their  Confidence  Intervals 

u  «  unbiased  estimate  of  the  mean  of  the  underlying  distribution  a  and  b  are 
the  upper  and  lower  confidence  interval  lengths  for  y. 

0  -   unbiased  estimate  of  the  standard  deviation  of  the  underlying  distribution 

U  and  L  are  the  upper  and  lower  confidence  interval  lengths  for  3o. 
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Figure  3.5-2   Stopping  Rule  Output  Histograms  for  the  90  Order 
Simulation. 
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We   denote  the  set  of  samples   of  each  X  e  ft  by    {x.}.      Although  what   follows 
could  be  done  for  the  unknown  density   of   X  e   ft    ,    this   approach  involves   inac- 
curacies   in  the  determination  of  the   fourth   central  moment   and  is   computationa] 
quite  expensive.      Instead  we   have   chosen  to   convert   each  X  e   ft  to  a  normal 
random  variable  Z  by  the  transformation 

lOi 


1   10  j  =  10(i-l)  J 


Since  the  samples  x  are  independent  and  identically  distributed  (iid)  the 

J 

2 

central  limit  theorem  implies  that  the  z.  are  approximately  N(y  ,o   /10 ) .   All 

i  y      •* 

statistical  evaluations  will  be  performed  on  Z  and  the  results  will  be  back- 
transformed  via  (3.5-1)  to  X.   Since  the  Z's  are  normal,  the  unbiased  estimates 
for  their  mean  and  variance  are  given  by  the  well  known  relations  [Winkler 
&  Hayes  (1970)] 

V     =  n 

i=l 


y  =  -  I     z.  (3.5-2) 

n  .^   1 


-.2 


I    (z.-MV 

o     =  i=± (3.5-3) 

(n-1) 

where  n  =  m/10  is  the  number  of  Z  sample  points  and  m  is  the  total  number  of  ru 
We  assume  that  even  after  the  initial  1^  runs,  the  CI  around  J  is  very  small. 

This  has  empirically  been  verified  as  a  valid  assumption  and  it  permits  us  to 
evaluate  B+  for  the  Z  variables  by  only  worrying  about  the  upper  CI  on  a  which 

"2.  ~    o 

we  denote  by  a   .   Again  due  to  the  normality  of  the  Z's,  c   is  well  known  to 
u  u 

be  [Winkler  &  Hayes  (1970)] 


;  2   (n-l)  a2 


(.975;  n-l) 
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■where  ■)(•  /        _n  denotes  the  value  in  a  chi-square  distribution  with  n-1 
J 

degrees  of  freedom  cutting  off  the  upper  .975  of  sample  values. 
B  for  the  Z  variables  is  then  given  by 


3oT 


B+  =_Un  =   ^1   /     (n-1)  (3>5_5) 

n    y        y    /  v  2 

n  V  *  (.975.  n-1) 
i 

where  the  subscript  n  denotes  the  number  of  Z  sample  points  in  the  simulation.   To 

evaluate  B    we  shall  require  (J    and  a   .   Even  with  relatively  small  n, 
n2  n2      n2 

y  ^  p    since  the  CI  even  at  n,  is  very  short.   Such  is  not  the  case  with 
n.,?*      n  1 

** 

a      .   We  can,  however,  upper  bound  a    if  a    is  known  by  noting  that 
n2  n2     n± 

2~  1~   1    has  an  F  distribution  with  n  -  n  and  n  degrees  of  freedom 
(n2-n1)(n1-l) 

o 

n2 
where  Q  ^  ttt~.   We  can  then  determine  K  such  that 

a 

nl 

P{Q  <_K}  =  .50  (3.5-6) 

and  then  use 

a   =  A o  (3.5-7) 

n2       nx 

in  (3.5-5).   It  is  clear  from  (3. 5-5) and  (3.5-7)  that  for  r^  and  n2  fixed, 
the  B+   is  linearly  related  to  B    for  each  Z.   The  histogram  of  Figure 


n2  n± 


+ 


3.5-2  can  then  be  generated  by  evaluating  the  actual  histogram  for  B  after 


* 

N.  =  M./10 


i     : 
** 


This  is  proved  in  section  3.5-^. 
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n  runs,  fixing  n  ,  evaluating  the  constant  multiplier,  C,  and  multiplying 
each  point  in  the  histogram  by  C.   In  particular, 


B    =  C  B+  Where 

n2       n± 


C  =    /  K 


n2  "  X      \  /  ni  -  2 


$   (.975-,  n2-l)y^#  (.975;  r^-l) 

and   K  is  given  by  (3.5-6) 

The  choice  of  the  probability  .5  in  (3.5-6)  is  not  arbitrary.   We  a^e 

interested  in  predicting  the  histogram  of  B    from  the  histogram  B        Foi 

n2  ni  1 

each  Z  e  ft,  define  an  indicator  function  I   as  follows: 

Z 


1  if  Q„  >  K 


I  ='      z 


0  otherwise 
The  numbers  of  Z's  for  which  Q  >  K  is  then  equal  to 

Li 

N    A   I      I 
Zeft 

Provided   .50  is  used  in   (3. 5-6), the   expected  value  of  N  =  \       E{I„}   =  —  |fi| 

Zeft 
where    |fl|    denotes  the  number  of  elements   in  the  set  ft.      The  histogram  of 

B  can  be  predicted  from  that   for  B         if  the  behavior  around  the  decile 

n2  n± 

points   can  be  quantified.      The  fact   that  E{N}   =  —   |ft| indicates   that  the  set   of 

points   around  each  decile  in  the  histogram  of  B         should  behave     in  the   fol- 

nl 
lowing  way.      The   deciles   of  B  are   approximately  C   times   the   deciles   of 

n2 

B    since  for  large  Iftl  roughly  half  of  the  points  around  each  B    decile 

n  '  '  n 


2k 


is  expected  to  change  by  more  than  a  factor  of  C  with  the  completion  of 

n0  runs.   The  other  half  is  expected  to  change  by  less  than  a  factor  of  C. 

Provided  some  degree  ol  Independence  exists  among  the  entries,  the  decile  at 

n0  should  therefore  be  approximately  c  times  the  decile  at  n  .   The  final  step 

simply  uses  (3.5-1)  to  convert  the  B    histogram  of  values  for  the  Z  variables 

n2 
in  il   back  to  a  histogram  for  the  associated  X  variables.   This  just  amounts 

to  multiplying  the  decile  values  of  B  by  /  10  since  /10a  =  a   and  u  =  u  . 

3.^.3  An  Example 

As  an  example,  we  shall  discuss  the  actual  90  order  stopping  rule 
results.  After  200  runs,  the  histograms  of  Figure  3.5-2  were  generated.* 
As  discussed  above,   B  =  3  a  /  y  was  chosen  as  the  ordinate  since  it 
is  a  useful  measure  of  the  variability  of  each  element  in  the  result 
set.  This  figure  predicts  the  variability  of  results  to  be  expected  after 
different  possible  stopping  points.  The  diminishing  returns  for  increasing 
the  number  of  runs  is  evident  from  a  comparison  of  the  marginal  benefit 
by  increasing  from  200  to  600  runs  to  that  obtained  by  increasing  from 
1000  to  2000  runs.  This  format  permits  the  analyst  to  decide  at  which 
point  the  marginal  benefit  no  longer  justifies  the  increased  cost.  Clearly, 
this  requires  a  non  trivial  judgement  by  the  analyst.  Based  on  the 
relatively  high  cost  per  iteration  in  this  simulation,  1000  was 
chosen  as  the  stopping  point. 

In  an  attempt  to  quantify  the  accuracy  of  this  stopping  rule,  comparisons 
of  predicted  and  actual  histograms  were  made  at  the  90  order.  Results 
are  given  in  Table  3.5-1.  Similar  tests  were  done  at  the  30  order  where 
actuals  were  compared  with  predictions  based  on  only  100  runs,  and  very 
small  errors  were  observed. 


*Although  similar  histograms  for  the  total  primary  vector, e,  and  GDO 

were  used  in  the  determination  of  the  proper  stopping  point,  for  purposes 

of  this  example,  we  shall  concentrate  on  figure  3-5-1. 

25 


r~ 

«-. 

kj 

■_; 

£; 

r^ 

<: 

c. 

.^, 

r- 

C 

c 

3 

— 

o 

o 

o 

Cj 

Cl 

Ui 

ui 

IM 

UJ 

ui 

u. ' 

u. 

UI 

Ui 

w 

O 

-* 

t>- 

1/ 

o> 

w 

V- 

f^ 

IV 

(J 

<IT 

-o 

o 

J 

»~ 

r- 

-» 

■Nj 

O 

*/■» 

o 

ru 

fl 

r» 

d 

C 

»- 

r^- 

o 

o- 

w-> 

O 

*~ 

*" 

(M 

f"t 

•>f 

v/t 

Cf 

•4 

«-       •-       o 


1 

l 

-o 

(%j 

r- 

f>» 

>S\ 

K1 

."J 

at 

r~ 

f*. 

•> 

f\i 

» 

KV 

K> 

ft 

IO 

CC 

o 

im 

■o 

<M 

o- 

o 

•O 

UJ 

r- 

-* 

lO 

"~ 

*" 

(\J 

<M 

»* 

l/^ 

3C 

u~. 

»-         «-         O 


»        <r-        «- 


*-         I-         O 


o»      o 


>      .-       .- 


ui         ui         Ui 


r- 

«— 

o 

o 

O 

o 

UI 

UI 

UI 

-J 

iSS 

ro 

0 

•f 

aO 

O  •-  i- 


o 
o 


o 
o 


o 

o 


IA         rr        «- 


«-         «-  -     C3 


O  O 

aj  UI 

«-  M 

■J  -t 

f-  fM 

r>  o 


S      = 


-O  r-  t- 


wW  IU 


*-        oo        r^ 


O  <-  r- 


uu        r-        r- 


26 


3.5.1+  Mathematical  Derivations 

(n^  -  n  -  l)n 
We  first  demonstrate  in  this  appendix  that  73 — ^77 ^N  Q  has 


U,  -  n  )(n  -  l) 


an  F/  \  2 

(n2  -  n  ;  n  )  distribution  where  Q  4—  and  then  outline  the  method 

a 

nl 
for  calculating  the  K  of  (3.5-6). 


Consider  the  event  R  A 


<  K  ' 


where 


I  7- 

"2  i»l 


n-1 


and  y.  =  z.  -  y 
1    1    z. 

1 


f  n2 
R  =<  I     y,2  <  (n-1)  K  <?2 
i=l  X      d       ni 


Since 


i=l   2  1 


r»2 

R  =('  I     y2  l[(n?-l)  K  -  (n-1)]  a2 
j i=n  +1  1 


,2   n-1 
Dividing  "both  sides  of  the  inequality  by  (n  -n  )  Q        ( )  we  obtain; 


n2 


(I        y? 


i=n  +1    . /(n  -n  ) 

R  = < 

n„ 


(  I1  y")/^ 

i=l 


L 


2  1   n    n 


(n_-l)K  -  (n.-l)  ;   n. 

_£ ± I  (_ JL_) 

n  — n  n  —1 
2  J      1   1 


J 


For  large  samples,  y   is  much  closer  to  the  mean  than  any  ^iven  sample  point, 

2* 
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z.  ,and  therefore  y.  are  N(0,o  )  and  iid.   Introducing  q.  =  --  y.  in  *  we  obtain 


r 


(I  q?)/(n,-nj 


R  =< 


i=n  +1 


1   x  2  1' 


i  i=l 


(n0-l)K  -  (n  -1)    n. 

— £ , J: ( ±_^    ** 


In  -n 


V1' 


where  q.,-jN(0,l).   Since  q.'^1  N(0,l)  and  q.  are  iid,  both  the  numerator  and 

T.  1  1 

v  2 

denominator  of  the  LHS  of  **  are  -k      random  variables  divided  by  their 

respective  degrees  of  freedom,  and  the  LHS  of  *  and  **  have  an  F,      _n  .  n  ^ 

(n2-n1-l)n1 
distribution  [Winkler  and  Hayes  (1970)].   Therefore,  -? — - — \  /  _^      Q  has  an 


(n  -n  in  )  distribution, 


QED 


We  shall  now  determine  K  such  that  P  {Q<K}  =  .5.   By  **,  this  event  is 


equivalent  to 


( 
P  (   F 


n±  (n2-l)K  -  (nx-l)  jl 


(n2-n1;n1)  -  n  -1 


n2~nl 


Therefore  we  simply  find  ^  such  that  P  <■'  F,        \<    ■       =  .5  and 

V  (n2-nrni)_" 


find  K  from 


K-  Mn1-l)(n2-n1)  +  (     j|   ^ 


1 


n2-l 
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k.O     ANALYSIS  OF  RESULTS 

The  basic  results  of  the  simulation  will  be  given  for  the  90  order 
1-0  matrix.   This  includes  information  on  bias  relative  to  published  values, 
variance  measures  and  their  relation  to  error  bounds,  the  sensitivity  of  the 
results  to  uncertainties  on  the  variances  of  the  underlying  BEA  data  and  the 
effects  of  aggregating  to  30  sectors  and  disaggregating  to  101  sectors.   As 
a  prelude,  we  begin  by  discussing  the  goodness  of  fit  tests  which  were 
required  to  verify  some  distributional  assumptions  inherent  in  the  simulation. 

k.l     Goodness  of  Fit 

The  methodology  for  the  goodness  of  fit  tests  was  developed  by  Stephens 
(197°),  who  describes  a  test  for  normality  based  on  the  Cramer-von  Mises 
statistic  which  may  be  employed  when  the  population  mean  and  standard  devia- 
tion are  not  known.   Stephens'  test  compares  a  given  sample  distribution 
function  to  a  normal  distribution  with  mean  and  standard  deviation  given  by 
the  sample  mean  and  sample  standard  deviation.   Included  in  Stephens'  paper  is  a 
table  of  significance  levels  for  the  statistic  given  the  hypothesis  that  the 
random  variable  being  tested  is  normal.   Thus,  a  test  of  normality  may  be 
made  by  calculating  the  value  of  Stephens'  statistic  for  a  given  sample  and 
comparing  it  to  the  tabulated  values  which  characterize  normal  behavior. 

The  first  series  of  tests  using  this  method  was  made  to  test  the  normal- 
ity of  the  Z  random  variables  defined  by  averaging  every  ten  consecutive 
sample  points  obtained  for  the  entries  in  the  simulation  results.   In  all, 
270  of  these  random  variables  were  tested,  one  for  each  entry  in  the  electric 
utility  sector  row  of  (i-A)   ,  the  total  primary  energy  vector,  e,  and  the 
total  output  vector,  X.   Table  U-l  shows  the  upper  tail  percentage  points 
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calculated  by  Stephens  along  with  the  observed  percentages  of  the  Z  random 
variables  which  fell  into  the  various  categories. 


OBSERVED  PERCENTAGE 

16.0 

12.22 

6.29 

3.33 

1. 

NORMAL  PERCENTAGE 

15.0 

10.0 

5.0 

2.5 

1. 

STEPHENS'  STATISTIC 

.091 

.10*4 

.126 

.148 

• 

Table  4-1.   A  Comparison  of  Observed  and  Theoretical  Upper  Tail  Percentage 
Points  for  Goodness  of  Fit  Tests  on  the  Random  Variables  Z. 


For  example,  Stephens  predicts  that  10$  of  all  normal  samples  will  achieve 
sample  statistics  larger  than  .104;  we  observed  12.2$  above  that  mark. 
Even  if  the  270  random  variable  being  tested  are  interdependent,  the  expected 
value  of  the  observed  percentages  should  equal  the  theoretical  percentages 
if  the  normality  hypothesis  is  satisfied.   Thus  the  results  are  very  reas- 
suring and  seem  to  justify  treating  the  average  variables  as  normal. 

A  second  series  of  tests  was  undertaken  to  examine  the  distributional 
properties  of  the  raw  data  for  the  same  270  entries.   In  the  absence  of 
averaging  there  is  little  reason  to  suspect  that  these  random  variables  are 
normal.   However,  the  results  were  surprising  in  that  very  many  of  the  270 
sample  statistics  were  small  and  therefore  indicate  good  fit  to  a  normal 
distribution  curve.   Those  entries  that  displayed  decidedly  non-normal  be- 
havior were  virtually  all  unimodal  but  slightly  skewed  to  the  right.   It  is 
interesting  to  conjecture  why  some  entries  seem  to  be  roughly  normal  while 
others  are  not;   perhaps  in  the  process  of  inversion  some  elements  of  (i-A) 
get  a  better  mix  of  elements  of  the  A  matrix.   At  any  rate  it  is  useful  to 
know  that  the  entries  are  all  more  or  less  unimodal  and  symmetric.   If  such 
is  the  case  then  3a  may  be  conveniently  employed  as  an  error  bound  on  the 
distance  from  the  mean,  p.   While  Chebychev's  inequality  guarantees  that 
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y  ±  3a  contains  at  least  o9%  of  the  total  probability  in  an  arbitrary 
distribution,  this  percentage  rises  to  99.7  in  the  normal  case.   Pre- 
sumably the  percentage  is  also  high  for  any  random  variable  whose  density 
function  is  roughly  unimodal  and  symmetric.   For  all  but  one  of  the  entries 
examined  here,  at  least  99%  of  the  sample  points  fell  within  three  sample 
standard  deviations  of  the  sample  mean.   Thus ,  3a  may  be  thought  of  as  an 
approximate  bound  on  deviation  from  the  mean  for  the  entries  in  the  simulation 
results,  even  if  many  of  those  entries  are  not  very  close  to  being  normal. 

k.2     Confidence  Intervals 

This  section  discusses  the  precision  of  the  sample  statistics  obtained 

for  various  simulation  results  in  light  of  the  goodness  of  fit  tests  just 

discussed.   Because  the  Z  variables  are  approximately  normal,  standard 

techniques  may  be  used  to  derive  confidence  intervals  for  the  mean  and 

variance  of  a  Z  variable  and  hence  for  the  mean  and  standard  deviation  of 

the  associated  entry.   After  1000  inversions,  a  97-5%  upper  confidence 

bound  a  on  the  standard  deviation  a  of  an  entry  is  given  by  a  =  a  *l.l6. 
u  J  u 

Thus,  a  is  a  fairly  good  estimate  of  a   for  any  given  entry. 

The  confidence  intervals  on  the  sample  means  are  even  smaller.   In  more 
than  90%  of  the  entries  in  the  inverse  the  population  mean  is  within  2%   of 
the  sample  mean  with  95%  confidence.  All  the  entries  of  e  and  X  are  ac- 
curate to  within  1%  with  95%  confidence. 
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^.3  Variability  of  the  Elements  in  the  Result  Set 

Histograms  of  jo/p  were  prepared  in  order  to  show  the  relative  amount 

of  variability  in  the  entries  of  the  results  set.   Three  such  histograms,  one 

for  the  whole  inverse,  one  fore  and  one  for  X,  are  displayed  in  Figure  k-1. 

For  half  of  the  entries  of  the  inverse,  3o/u    is  less  than  20%  while 

virtually  all  the  entries  of  e  and  X  have  3o/p  less  than  20%.   The  above 

discussion  of  confidence  intervals  suggests  that  these  histograms  would  not 

change  substantially  if  the  sample  statistics  were  replaced  by  the  population 

means  and  standard  deviations.   Since  these  entries  are  roughly  unimodal  and 

symmetric,  the  histograms  may  then  be  taken  as  a  good  measure  of  the  variabili 

in  the  entries  of  the  various  subsets  of  the  results.   The  large  decrease  in 

variability  from  the  elements  of  the  inverse  to  the  elements  of  X  suggests 

that  significant  error  cancellation  occurs  as  linear  combinations  of  many 

1-0  coefficients  are  computed. 

(3ou  +  y  -  p) 

In  addition  to  those  discussed  above,  histograms  for  

P 

(3a  +  p  -  u) 
u 

and  ,  where  p  =  published  value,  were  also  computed  in  order 

to  relate  p  to  the  upper  and  lower  bounds  on  the  uncertainty  in  an  entry. 
Because  y  is  generally  very  close  to  p  and  because  a  is  only  slighter 
larger  than  a,  these  histograms  are  very  similar  to  the  histograms  for 
3o/y  except  that  the  values  are  all  slightly  larger. 

k.k     Bias  on  Elements  of  the  Result  Set 

In  standard  statistical  language,  bias  is  usially  defined  as  the  dif- 
ference between  the  mean  of  an  estimator  and  the  true  value  of  the  quantity 
to  be  estimated.   We  use  the  term  in  a  fundamentally  different  way  to  denote 
the  difference  between  the  mean  of  the  simulation  output  variables  and  their 
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corresponding  published  values.   The  mean  values  of  the  assumed  distributions 
of  each  element  of  the  transactions  matrix  are  equal  to  their  respective 
published  values.   One  important  result  of  the  simulation  is  to  determine 
the  bias  introduced  by  normalization  and  inversion  in  passing  from  the  trans- 
actions matrix  to  (i-A) 

Fig.  h-2   details  histograms  of  the  ratio  of  sample  mean  to  published 
value,  u/p,  for  three  important  and  disjoint  subsets  of  the  result  set,  viz 
the  vectors  of  total  output  X  and  total  primary  energy  intensity  e  and  the 
entire  inverse  (i-A)   .   Three  aspects  are  noteworthy: 

1)  Nearly  all  y  cluster  within  2%   of  their  published  values. 

2)  Within  this  cluster,  y  tends  to  have  a  positive  bias  more  often 
than  a  negative  one. 

3)  Essentially  none  of  the  y  fall  below  9&%  of  their  respective  pub- 
lished values  while,  especially  in  the  inverse,  a  small  number  of 
y  range  well  above  the  published  value. 

The  reason  for  this  positive  bias  is  unclear.   The  best  explanation 
may  be  that  transactions  reported  by  BEA  as  zero  were  assigned  a  small 
positive  value  in  the  simulation  to  account  for  the  fact  that  no  trans- 
action is  known  to  be  exactly  zero  (see  Appendix  C).   The  large  percentage 
excess  over  the  published  value  may  result  for  the  same  reason,  since  an 
inverse  element  may  be  affected  (percentagewise)  quite  significantly  if 
its  corresponding  direct  coefficient  A. .  changes  from  zero  to  some  finite 
value . 

k-5.      Sensitivity  of  Simulation  Results  to  Assumptions  on  Input  Uncertainties 

Since  the  variances  assigned  to  input  quantities  such  as  the  trans- 
actions matrix,  FD  and  GDO  are  only  estimates  of  the  true  variances,  simula- 
tion results  have  meaning  only  if  small  changes  in  these  assumed  variances, 
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do  not  cause  very  large  changes  in  simulation  outputs.   This  sensitivity  to 
changes  in  input  variances  was  investigated  by  repeating  the  simulation  with 
the  standard  deviations  of  all  normal  quantities  doubled  and  dispersion 
factors  on  lognormal  inputs  doubled.   Three  major  effects  were  noted: 

1)  The  ratio  of  CI  to  u,  where  CI  is  the  length  of  the  95%   confidence 
interval  for  the  mean,  was  doubled  by  the  factor  of  two  increase 
in  standard  deviation. 

2)  The  ratio  of  3a  to  y  doubled  on  the  average  by  doubling  the  input 
standard  deviations. 

3)  Increasing  the  input  variability  made  the  biases  slightly  more 
negative.   This  is  thought  to  be  the  result  of  increasing  simulation 
sensitivity  to  the  larger  elements  of  the  transactions  matrix  and 
decreasing  relative  sensitivity  to  the  smaller  elements  discussed 

in  section  k.k. 

Since  output  uncertainties  only  doubled  with  a  factor  of  two  increase 
in  input  uncertainties,  the  simulation  is  probably  very  stable  with  regard 
to  assumptions  on  input  variances. 

The  absolute  magnitudes  of  these  results  with  doubled  input  uncertain- 
ties may  be  useful  in  assessing  the  general  viability  of  1-0  results  applied 
far  beyond  the  base  year  (the  uncertainty  of  base  year  parameters  increases 
over  time).   Moreover,  if  institutional  factors  make  it  unlikely  (as  some 
claim)  that  government  can  fairly  estimate  uncertainty  of  its  own  data,  then 
these  results  show  the  effect  of  a  50%   underestimate  of  the  actual  uncertainty. 

U.6  The  Effect  of  Aggregation 

The  effect  of  aggregating  the  90  order  model  to  30  order  was  analyzed 

for  two  reasons : 

l)   It  was  felt  that  although  variances  at  the  30  order  were  smaller 
than  those  of  the  90  order  due  to  the  aggregation,  more  error 
cancellation  should  exist  at  the  90  order  where  more  input  elements 
combined  to  form  elements  of  X,  e  and  (i-A )"-'-. 
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2)   Since  much  1-0  work  is  done  at  the  360  order,  it  is  of  interest 
to  determine  whether  the  expansion  of  the  simulation  to  360  order 
would  likely  require  more  than  the  1000  runs  used  in  the  90  order 
case. 

Aggregation  produced  effectively  no  change  in  the  simulation  output 

uncertainties: 

1)  The  ratio  of  3a  to  y  remained  virtually  unchanged  by  the  aggregation. 
This  implies  that  the  cwo  effects  mentioned  above  virtually  cancel 
one  another. 

2)  The  already  very  small  biases  of  Fig.  k-2   were  made  slightly  more 
negative  by  aggregating  to  the  30  order. 

3)  Since  the  ratio  a   /o     is  a  function  only  of  the  number  of  simulation 
runs,  it  is  unaffected  by  aggregation. 

These  results  give  no  indication  that  more  than  1000  runs  would  be  needed 

in  the  360  sector  case. 

U.7  Results  for  the  101  Sector  Model 

As  discussed  in  Section  2.2  above,  the  purpose  of  the  101  order  model 
is  to  trade  increased  base  year  uncertainty  for  increased  parametric  stability 
over  time.   The  purpose  of  the  101  order  simulation  was  to  measure  the 
increase  in  base  year  uncertainty  over  the  90  order  model.   Comparison  of 
90  order  and  101  order  histograms  for  y /published,  CI  /y  and  3a/y  indicates 
virtually  no  change  in  (l-A)~  ,  GDO  and  e  and  a  slight  increase  in  3a /y  for 
the  energy  related  rows  of  (l-A)~  .   In  particular,  at  the  90  order,  95.6$  of 
the  elements  of  the  total  primary  energy  intensities  had  3a/y  <  .15  while  at 
the  101  order,  9h%   were  less  than  .15.   This  indicates  a  rather  low  cost  in 
increased  stability  over  time.   This  low  cost  is  thought  to  be  due  to  the  fact 
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that  there  are  many  more  elements  of  the  transactions  matrix  which  are  import- 
ant  to  the  energy  embodied  in  a  particular  sector  output  .   Even  though 
the  uncertainty  of  each  of  these  elements  is  greater  in  the  101  case,  more 
of  them  combine  so  greater  error  cancellation  occurs. 


# 

Consider  natural  gas  to  auto  manufacturing.   Natural  gas  is  sold  to  perhaps 

eight  energy  products  which  in  turn  are  sold  to  the  automobile  sector. 
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APPENDIX  A.   BASE  YEAR  UNCERTAINTY  ESTIMATES 
Clark  W.  Bui lard 
A.l  BE A  DATA 

A.  1 . 1  INTRODUCTION 

Input-output  data  form  the  basis  for  most  structural  analyses  of  the 
U.S.  economic  system.   The  massive  tables  of  data  provide  a  complete  and 
internally  consistent  set  of  linear  production  functions  for  all  sectors 
of  the  economy.   But  surprisingly,  these  analytical  objectives  and  appli- 
cations do  not  guide  the  efforts  to  acquire  the  data  and  compile  the 
tables.  Actually  the  input-output  tables  are  constructed  as  a  bridge  be- 
tween the  national  income  and  product  accounts  for  selected  base  years  in 
order  to  provide  a  "benchmark  GNP"  estimate  for  those  years. 

It  is  important  for  the  analyst  using  input-output  (i-o)  data  for 
structural  analyses  to  view  the  data  from  this  perspective.   Since  it  was 
not  acquired  primarily  to  support  structural  economic  analyses,  it  places 
additional  burdens  on  the  analyst  to  verify  the  data's  usefulness  and 
relevance  to  his  particular  application. 

Consider  the  most  general  type  of  application,  where  the  analyst 
wants  to  predict  sector  outputs  X  needed  to  produce  a  final  bill  of 
goods  Y.   To  do  this  he  premultiplies  Y  by  the  Leontief  inverse  matrix 
(l-A)~  which  is  calculated  from  a  matrix  of  direct  coefficients  A  for  a 
(prior)  base  year  input-output  table.    The  problem  the  analyst  must 
address  is :  What  is  the  uncertainty  AX  on  the  result  X_  given 


Data  for  368  sectors  are  published  by  the  U.S.  Department  of  Commerce  (l97^a) 
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For  a  more  detailed  discussion  of  input-output  analyses  see  Leontief  (l9Ul). 
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uncertainty  AA  in  the  input-output  coefficients?  Actually  AA  may  result 
from  1)  base-year  measurement  error  and  2)  changes  in  the  actual  A  since 
the  base  year.   In  this  paper  we  are  concerned  only  with  the  former. 

Quantitative  methods  for  treating  this  general  error  analysis  problem 
are  of  three  types.   The  first  uses  the  condition  number  of  (i-A)  to  obtain 
a  bound  on  the  norm  of  (l_-A)   ,  resulting  in  an  extremely  conservative 
upper  bound  on  parametric  error  magnification.   Ar.  expression  for  true 
maximum  upper  bound  on  (l-A)~  was  derived  by  Secald  (1973),  and  the 
relative  importance  of  certain  parameters  to  specific  applications  was 
determined.   Even  the  true  upper  bound,  however,  was  quite  conservative 
in  that  it  did  not  account  for  the  (likely)  possibility  of  error  can- 
cellation. 

This  report  is  limited  to  presentation  of  uncertainty  estimates  on 

1-0  data  used  in  calculating  the  direct  coefficients  A. 

A. 1.1.1  Sources  of  Error 

Uncertainty  in  the  1-0  coefficients  is  related  directly  to  several  source 

of  error  in  estimating  interindustry  transactions  for  the  base  year.   Due 
to  the  exhaustive  nature  of  1-0  data,  it  originates  from  a  variety  of  sources 
ranging  from  census  questionnaires  to  judgemental  guesses.  Morganstern  (1950 
has  categorized  the  various  sources  of  error  in  economic  data  and  most  of 
his  observations  are  relevant  here.   The  total  uncertainty  on  a  particular 
transaction  "measurement"  will  include  effects  of  incomplete  census  cover- 
age, reporting  errors  due  to  misunderstandings  or  outright  lying,  sampling 
errors  inherent  in  surveys  of  firms,  transcription  or  key  punching  errors, 
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the  possibility  that  forms  are  lost,  classification  errors  (matching  firms 
and  products  to  SIC  codes),  and  last  but  certainly  not  least,  the  problem 
of  separating  companies  from  establishments  in  processing  returns  from 
surveys  or  censuses. 

A. 1.1. 2  Effects  of  Scale 

The  scale  of  this  problem  is  what  makes  it  unique.   Due  to  the  size 
and  complexity  of  the  system  being  modeled  (the  U.S.  economy)  measurments 
can  be  taken  only  at  infrequent  intervals  and  at  great  expense.   Moreover, 
it  takes  whole  institutions  to  obtain  the  measurements  (e.g.  the  U.S.  Census 
Bureau)  so  the  user  of  the  data  is  generally  not  the  one  who  acquired  it. 
Thus  the  burden  borne  routinely  by  persons  who  play  the  roles 
of  data-taker  and  analyst  is  now  split  amoung  bureaucracies.  Part  of  that 
burden — responsibility  for  estimating  parametric  uncertainty  and  its  effects 
on  analyses — is  sometimes  never  borne  because  of  the  way  the  roles  and 
responsibilities  of  the  bureaucracies  are  defined. 

The  mission  of  the  Census  Bureau  is  to  produce  statistics;  the  Bureau 
of  Economic  Analysis  (BEA)  takes  these  and  others  and  produces  accounting 
tables  supporting  a  benchmark  GNP  estimate.   The  analyst  would  like  to 
take  these  statistics  and  interpret  them  as  observations  of  a  physical  sys- 
tem whose  structure  he  would  like  to  model.  The  statistics  are  often 
published  in  terms  of  5  or  10  significant  figures,  but  none  of  the  hundreds 
or  thousands  of  persons  involved  in  deriving  a  statistic  are  responsible 
for  estimating  and  documenting  its  uncertainty. 
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A. 1.2  METHOD 

Recognizing  that  actual  measurements  of  interindustry  transactions 
and  other  variables  are  made  in  the  presence  of  "noise"  (error  sources), 
and  that  frequent  measurements  are  impractical,  we  must  rely  on  subjective 
estimates  of  uncertainty.   Such  estimates  are  test  obtained  at  the  level 
of  detail  at  which  the  measurements  are  taken,  but  here  too  a  compromise 
must  be  made.   A  single  transaction  in  an  1-0  table  may  be  the  sum  of  millions 
, of  individual  measurements  of  physical  quantities;  this  report  is  based 
on  interviews  with  personnel  at  BEA,  Census,  and  other  agencies  near  the 
top  of  this  statistical  pyramid. 

A. 1.2.1  Quantities  Estimated 

Uncertainty  estimates  were  obtained  on  the  three  basic  constituents  of 
the  interindustry  transactions  matrix.  These  were  direct  allocations,  mar- 
gins on  domestic  transactions,  and  transfers*  Independently,  estimates  were 
obtained  for  final  demands,  gross  domestic  outputs,  and  imports  and  exports. 

In  the  next  section,  uncertainty  estimates  will  be  given  for  each  of 
these  categories  of  data. 

A. 1.2.2  Degree  of  Detail 

Within  the  scope  of  this  study  it  was  possible  to  consider  data  inputs 
to  the  1-0  tables  at  the  U8U-sector  level  of  detail  in  many  cases;  and  at 
the  368-sector  level  for  the  remainder.  At  the  more  detailed  level,  a 
magnetic  tape  was  available  from  BEA  which  included  notes  fcr  various  direct 
allocations  indicating  the  source  of  the  data  and  the  magnitude  of  the 


« 
Precise  definitions  of  these  terms  are  given  by  the  U.S.  Department  of 
Commerce  (l97Ub). 
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figure  obtained  from  that  source.   This  tape  was  scanned  for  notes  identi- 
fying entries  from  the  Census  Bureau  or  other  sources  deemed  equally  ac- 
curate.   If  more  than  75%  of  the  entry  in  the  368-order  Direct  Allocations 
matrix  was  from  one  of  these  sources,  it  was  assigned  the  same  uncertainty 
as  census  data.   Estimates  of  uncertainty  for  all  other  data  were  made  at 
the  368-level  of  detail  as  described  in  the  next  section. 

A. 1.2. 3  Interviev  Techniques 

Many  agency  personnel  seemed  well-prepared  and  sometimes  even  anxious 
to  assign  quantitative  estimates  of  uncertainty  to  the  statistics  for  which 
they  were  responsible.   Others  were  quite  reluctant,  citing  the  fact  that 
the  "correct"  answer  was  not  known  and  only  one  measurement  had  been  taken 
so  there  was  inadequate  information  on  which  to  base  an  answer.  While  this 
latter  group  was  probably  more  correct  in  their  assessment  of  the  situation, 
it  should  be  remembered  that  such  a  statement  could  be  used  as  an  "excuse" 
for  covering  up  error  levels  that  might  reflect  badly  on  one's  job  perfor- 
mance.  In  virtually  every  case,  those  interviewed  responded  with  a  quanti- 
tative answer  to  a  question  of  the  form  "If  God  appeared  and  told  the 
correct  number  to  the  commander  of  a  firing  squad,  and  if  that  commander 
asked  you  to  estimate  error  bounds  for  your  published  figure  and  threaten- 
ed to  kill  you  if  the  correct  figure  lay  outside  the  bounds ...  What  would 
you  estimate?" 

During  the  course  of  interviews  with  persons  relying  on  the  same  data 
sources,  and  with  persons  responsible  for  producing  that  source  data,  I  was 
able  to  arrive  at  what  I  believe  to  be  an  internally  consistent  set  of  uncer- 
tainty estimates.'  All  results  presented  in  this  report  may  be  attributed  to  the 


These  sources  are  Minerals  Yearbook,  Census  of  Mineral  Industries,  Census 
of  Manufactures  Table  7A,  Census  of  Transportation,  Census  of  Business, 
Interstate  Commerce  Commission,  and  Civil  Aeronautics  Board  publications. 
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author,  although  footnotes  are  used  to  identify  the  persons  whom  I  inter- 
viewed to  obtain  information  and  impressions.   Given  the  nature  of  the 
strong  institutional  pressures  for  downward  bias  in  these  estimates,  I 
do  not  expect  that  the  pressures  for  conservatism  that  I  offered  in  phras- 
ing my  interview  questions  provided  a  significant  counteracting  force. 


A.I.2.U.   Bi 
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It  is  expected  that  uncertainty  estimates  obtained  from  such  "top  of 
the  pyramid"  interviews  will  be  biased  downward,  since  a  BEA  employee  (say) 
will  be  reluctant  to  question  the  Census  Bureau's  estimate  of  the  total 
U.S.  steel  production  unless  he  has  conflicting  statistics  from  somewhere 
else.   Since  the  Census  Bureau  has  a  virtual  monopoly  on  such  statistics, 
the  latter  situation  is  impossible;  since  the  BEA  employee  has  barely  the 
resources  to  do  his  own  job,  he  cannot  begin  to  duplicate  the  efforts  of 
the  Census  Bureau  so  the  former  situation  never  arises  either.   Simply 
stated,  if  one  bureaucracy  publishes  a  seven-significant-figure  statistic 
that  cost  a  million  dollars  to  derive,  the  humble  bureaucrat  in  another 
agency,  with  his  own  problems  to  worry  about,  is  unlikely  to  seriously 
challenge  the  figure. 

Possible  treatments  for  this  problem  of  bias  will  be  discussed  in 
the  last  section. 

A. 1.2.5  Effect  of  Numerical  Magnitude 

Development  of  1-0  data  involves  much  work  within  established,  or 
relatively  well-known,  control  totals.   For  this  reason,  and  since  the 
work  is  done  primarily  within  an  accounting  framework,  the  largest  numbers 


« 
See  Morganstern  (1950). 
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usually  receive  the  most  attention  and  are  the  best  known,  and  the  "residual" 
between  the  well -known  components  and  the  control  total  is  often  distributed 
among  other  categories  using  some  kind  of  estimation  algorithm.   The  only 
exception  to  this  general  "rule"  occurs  when  the  figure  involved  has  a  signi- 
ficant impact  on  the  value  of  GNP;  then,  though  small,  the  figure  may  become 
the  subject  of  further  analysis  and  refinement. 

A. 1.3.   UNCERTAINTY  ESTIMATES 

In  this  section,  estimates  will  first  be  presented  at  the  368-sector 
level  of  detail.   This  was  the  level  of  disaggregation  at  which  most  of  the 
persons  interviewed  were  most  comfortable  in  assigning  their  subjective 
estimates  of  uncertainty. 

As  indicated  earlier,  all  estimates  of  upper  and  lower  bounds  presented 
here  may  be  attributed  to  the  author.  The  discussion  and  footnotes  indicate 
the  source  of  my  impressions  and  information. 

Estimates  of  upper  and  lower  bounds  are  given  in  two  ways.  The  first 
is  a  fraction  6  which  denotes  symmetric  bounds  around  the  published  value 
of  +  1006  %.     The  second,  applied  in  cases  where  the  published  value  is 
less  well  known,  is  the  factor  D  which  when  multiplied  by  the  published 
value  gives  the  upper  bound,  and  whose  inverse  determines  the  lower  bound. 
All  bounds  should  be  taken  to  represent  a  99- 1%   confidence  level. 

A. 1.3.1  Direct  Allocations 

"Good"  Census- grade  entries.   All  transactions  from  one  manufacturing 
sector  to  another  are  assigned  6  =  .05,  as  are  all  other  interindustry 
direct  allocations  obtained  from  Census  Bureau  sources.  This  figure  is 
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This  information  based  primarily  on  interviews  with  Kenneth  Hanson,  Richard 
Chassey,  Ruth  Runyan,  and  Patrick  Duck  of  the  Census  of  Manufactures, 
Industry  Division. 
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based  on  interviews  with  Census  Bureau  personnel  who  feel  their  techniques 
for  circumventing  problems  associated  with  less  than  100$  coverage  are 
well  within  these  limits,  and  that  internal  cross-checks  minimize  report- 
ing and  related  errors.  The  largest  source  of  error  here  is  suspected  to 

be  classification  error;  matching  products  and  firms  to  SIC  codes. 

* 

Agriculture  sector  rows.   Based  largely  on  crop  reporting  surveys; 

estimate   6  =  .10  except  for  certain  transactions  noted  elsewhere 
(e.g.  government  final  demand). 

Agriculture  sector  columns.    Inputs  from  real  estate,  chemicals,  and 
chemical  fertilizer  mining  are  known  best  from  surveys  and  other  sources; 
estimate   6  =  .10.   Directly  allocated  inputs  from  transportation  and  trade 
sectors  were  treated  the  same  as  margins,  as  described  in  sections  3,*+.  All 
other  entries  are  based  at  least  in  part  on  farm  expenditure  surveys  taken  in 

1955;  assume  D  =  2  for  all  entries  greater  than  1%  of  gross  domestic  out- 
put for  the  sector.  All  smaller  nonzero  numbers  scaled  from  D  =  2  -*■  D  =  10 
as  described  in  Sec.  A. 3. 

Federal  government  purchases.   For  both  defense  and  non-defense 
purchases,  the  following  assumptions  apply;  hew  construction  inputs  are 
based  on  a  good  data  source,  so  assign  6  =  .05;  maintenance  and  repair 
construction  is  more  subject  to  classification  errors,  so  6   =  .10. 
All  entries  between  $10  million  and  $50  million  are  assigned  5  =  .  30 
unless  otherwise  specified  below.   Purchases  less  than  or  equal  to 
$10  million  are  assigned  D  =  2  ->  10  as  discussed  in  Sec.  A.  3. 


*Based  primarily  on  interviews  with  Jerry  Schluter,  U.S.  Department  of 
Agriculture. 

Based  primarily  on  interviews  with  Roy  Seaton ,  Bureau  of  Economic  Analysis, 
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Defense  purchases  are  generally  better  known,  due  to  more  complete 
source  data.   Inputs  from  manufacturing  sectors  are  assigned  6  =  .10 
if  they  exceed  $50  million.  Transportation  inputs  were  derived  from 
outdated  formulae  that  applied  poorly  to  the  Southeast  Asia  situation 
in  1967  and  are  assigned  6  =  .50.   Other  non-manufacturing  inputs  were 
assigned  6  =  .10  if  they  were  above  the  $50  million  threshold. 

Non-defense  purchases  of  inputs  from  non-manufacturing  sectors  were 
less  well  known,  and  were  assigned  D  =  3  if  they  exceeded  one  percent  of 
total  inputs  and  D=3-*10  if  they  were  smaller,   liar.uf  acturir.g  ir.puts  below 
the  $50  million  threshold  were  treated  the  same.   Transportation  inputs 
were  assigned  6  =  .30. 

State  and  local  government  purchases.   7cr  health,  welfare,  education, 
and  sanitation  purchases,  new  construction  and  real  estate  inputs  are  as- 
signed 6  =  .05  since  they  are  obtained  from  census  sources.   Together  with 
wages,  these  inputs  account  for  nearly  75^  of  all  ir.puts.   Other  inputs 
are  assigned  6  =  .25  if  they  exceed  1%   of  total  ir.puts,  and  Z   =  1.5  -*■  10 
as  per  Sec.  A. 3  if  they  are  equal  to  or  smaller  than  1%. 

For  public  safety  purchases,  new  construction  and  real  estate  are 
assigned  6  =  .05.   Maintenance  construction  is  known  poorly;  D  =  1.5. 
Manufactured  inputs  greater  than  $2  million  are  assigned  D  =  1.5,  and 
smaller  inputs  D  =  1.5  ■*  10.   Non-manufactured  ir.puts  are  assigned  D  =  1.  5  for 
those  greater  than  $10  million,  and  D  =  1.5  •*■  10  for  the  smaller  ones. 

Other  state  and  local  government  purchases  are  also  assigned   5  =  .05 
for  new  construction  and  real  estate,  but  also  5  =  .05   for  maintenance 
construction  since  it  is  primarily  highway  maintenance  which  is  a  Census 
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number.   Manufactured  inputs  greater  than  $5  million  are  assigned  D  =  1.5, 
and  smaller  figures  D  ■  1.5  ■+•  10  ae  per  Sec.  A.  3.   Non-manufactured  inputs 

greater  than  $50  million  are  assigned  D  =  2  and  D  =  2  -*■  10  for  smaller  inputs 

# 
Imports  and  exports.   Trade  data  for  commodities  (BEA  sectors  1.00  - 

6U.00)  are  obtained  from  Census  sources  and  are  assigned  6  =  .05.   Trans- 
portation and  wholesale  and  retail  trade  data,  including  margins,  were  assign* 
6  =  .25.   Data  on  other  items  (services,  etc.)  involved  in  international 
trade  were  assigned  D  =  2,  since  they  were  obtained  from  balance  of  pay- 
ments sample  data.   Small  entries  at  the  368-sector  level  of  detail,  repre- 
senting less  than  1%   of  gross  imports  or  exports  were  assigned  D  =  2  ■*  10 
as  per  Sec.  A. 3. 

Inventory  change.   These  figures  are  in  general  the  least  accurate 
of  all  final  demand  entries,  and  were  assigned  6  =  .20  for  manufactured 
goods  and  6  =  .40  elsewhere. 

"All  other"  direct  allocations.  Within  the  scope  of  this  study  it  was 
impossible  to  identify  those  responsible  for  most  entries  in  the  input- 
output  tables.  Having  taken  care  of  most  entries  through  interviews  des- 
cribed above,  the  remainder  were  handled  as  a  group.  The  algorithm  was 
designed  to  assign  very  tight  tolerances  to  any  transaction  comprising  a 
high  percentage  of  total  outputs  or  inputs,  and  to  any  sector's  output 
which  "by  definition"  had  to  be  assigned  to  a  particular  cell.   For  example, 
the  algorithm  had  to  assign  a  very  tight  tolerance  to  sales  from  new  resi- 
dential construction  to  gross  private  capital  formation,  so  it  would  be 
compatible  with  the  tolerance  assigned  to  that  sector's  gross  domestic  out- 
put. There  are  numerous  other  instances  where  census  data  might  identify 



Based  primarily  on  interviews  with  Robert  Mangen,  Bureau  of  Economic  Analysis. 
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sales  of  "butter  to  food  processors  or  bakers,  and  the  remainder  is  attri- 
buted to  personal  consumption  expenditures.   On  the  other  hand,  very  small- 
magnitude  transactions  were  assigned  high  uncertainty  for  the  reasons  dis- 
cussed earlier. 

The  algorithm  defined  two  fractions  for  each  direct  allocation: 
an  input  fraction,  by  normalizing  with  respect  to  the  gross  domestic 
output  of  the  consuming  sector;  and  an  output  fraction,  by  normalizing 
with  respect  to  the  gross  domestic  output  of  the  producing  sector.   The 
algorithm  proceeds  with  these  tests  in  the  following  order,  and  assigning 
6  or  D  when  the  first  condition  is  satisfied:   if  both  fractions  exceed 
•95  then  6  =  .01,  if  only  one  exceeds  .95  then  5  =  .02;  if  both  exceed 
.80,   6  =  .05,  if  only  one  exceeds  .80,   6  =  .10;  if  either  fraction  ex- 
ceeds .05,  then  6  =  .20;  if  either  exceeds  .01,  then  Z  -   1.5.   If 
both  are  smaller  than  .01  it  assigns  D  =  2  -*■  10  as  per  Sec.  A.  3. 

* 
A.  1.3. 2  Gross  Domestic  Output 

These  figures  are  the  best  known  because  they  are  from  the  Census 

or  other  equally  reliable  sources  (e.g.,  IRS)  and  are  assigned  5  =  .01. 

The  largest  errors  here  probably  stem  from  classification  problems  and 

possible  confusion  between  company  and  establishment-based  data. 

A. 1.3. 3  Transfers. 

If  both  the  row  and  column  sectors  were  manufacturing  sectors  ,  the 


Based  primarily  on  interviews  with  Gene  Roberts  ana  Phil  Ritz  ,  Bureau  oi 
Economic  Analysis,  and  with  Kenneth  Hanson,  Census  of  Manufactures  Industry 
Division. 
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source  of  this  data  was  the  Census  Bureau,  cut  the  accuracy  was  less  than 
that  of  direct  allocations;  assign  6  =  .20.   All  other  transfers  were 
assigned  upper  and  lower  bounds  in  the  same  manner  as  the  corresponding 
cell  in  the  direct  allocations  matrix. 

A.  1.3.*+  Margins 

Transportation  margins,  Toy  product  type  and  mode,  are  obtained  as 

totals  and  then  prorated  proportional  to  producers'  prices  across  all  pur- 
chasers of  that  commodity.   Then  margins  in  each  .input  are  summed  for 
each  purchaser  and  added  to  the  directly  allocated  inputs.   For  all 
transport  modes,  6  =  .25  was  assigned  to  the  margins,  Wholesale  and 
retail  trade  margins  may  be  expected  to  be  more  variable,  and  are  some- 
times computed  as  percentage  markups  over  the  already  estimated  trans- 
port-margins.  Therefore  they  are  assigned  6  -    .35- 

A.l.U.   CONCLUSIONS,  APPLICATIONS,  AND  LIMITATIONS 

Earlier  work  using  maximum-upper-bound  analyses  had  shown  the  dangers 
that  might  be  encountered  using  results  of  input-output  analyses.   There- 
fore, these  estimates  of  uncertainty  on  the  actual  data  were  needed  to  check 
the  maximum  error  bounds  on  the  particular  results  we  were  interested  in 
using  (e.g. ,  elements  of  the  energy  sector  rows  of  the  1967  Leontief  inverse 
matrix) .   It  soon  became  evident  that  the  magnitude  of  the  uncertainties  in 


*See  for  example  the  results  presented  by  Bullard  and  Sebald  (1975). 
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the  parameter  estimation  process  that  our  max' mum  upper  bound  analysis  would 
yield  unsatisfactory  results. 

The  above  information  is  given  to  further  i.lluminat'e  the  context  in 
which  these  uncertainty  estimates  were  made,  nnd  hopefully  will  dis- 
courage inappropriate  applications  of  the  results. 

Finally,  I  repeat  that  the  uncertainty  estimates  presented  here 
are  my  own.   I  have  listed  many  of  the  persons  whom  I  interviewed,  but 
they  have  not  endorsed  my  interpretations  of  those  interviews.   If  the 
absolute  levels  of  the  estimates  are  widely  disputed  (and  I  expect  they 
will  be)  perhaps  at  least  the  relative  levels  will  be  accepted.   On 

this  basis  we  have  performed  stochastic  error  analyses  on  the  1°67  U.S. 
input-output  model  for  several  cases;  including  doubling  error  margins 
presented  here,  to  determine  the  sensitivity  of  the  results  to  systematic 
bias  in  the  estimates. 


51 


A.  2  DIRECT  ENERGY  ALLOCATIONS 

Knecht  (1975)  estimated  error  tolerances  on  all  physical-unit  energy- 
transactions.   These  are  coded  in  Table  A. 2-2,  at  the  90  sector  level  of 
detail,  and  in  Tables  A. 2-3  and  A.2-U  at  the  101-sector  level,  and  the"  codes  ai 
explained  in  Table  A. 2-1  below. 

Table  A. 2-1 
ENERGY  TRANSACTION  TOLERANCE  CODES 


Code 


00  y=0  and  3a=1011Btu) 

01,  02,09,13  .05 

0U, Ul,l6, 18, 19,20,2^,28  .10 

03,05,06,29,30  .15 

07, 12, Ik, 15, 17, 22, 23, 26, 27  .20 

25  .25 

08,10,11  .30 

10  .35 


*  Note  that  instead  of  the  368-sector  level  of  aggregation,  the  results  pre- 
sented here  are  consistent  with  the  slightly  aggregated  357-sector  breakdown 
described  by  Bullard  &  Herendeen  (1975).   Dummy  sectors  consuming  no  energy 
have  been  deleted  and  public  and  private  sectors  producing  the  same  primary 
product  have  been  combined. 
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TABLE  A. 2-2 


TOLERANCE  CODES  FOR  DIRECT  ENERGY  USE  DATA  (90  SECTOR) 


Sector 

Number 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
2k 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 


Sector  Name 

COAL  MINING 
CRUDE  PETR0LEU1/  GAS 
REF'D  PETROLEUM  PROD'S 
ELECTRIC  UTILITIES 
NATURAL  GAS  UTILITIES 
LIVESTOCK.. .PRODUCTS 
OTHER  AGRIC'L  PRODUCTS 
FORESTRY  AND  FISHERY.. 
AG./  FOR'Y.  ..SFRVICES 
IRON.. .OPES  MINING 
NONFERROUS  ORES  MINIMS 
STONE  AND  CLAY  MINING. 
CHEMICALS^  ETC.  MINING 
NEW  CONSTRUCTION 
MAINT.  AND  REPAIR  CON. 
ORDNANCE  AND  ACCESSOR. 
FOOD  AND  KINDRED  PROD. 
TOBACCO  MANUFACTURING 
FABRIC. ..THREAD  MILLS 
MISC.  TEXTILE. ..FLOOR. 
APPAREL 

MISC.  FA3.  TEXTILE  PRO 
LUMBER. ..PROD'S/  EXCEP 
WOODEN  CONTAINERS 
HOUSEHOLD  FURNITURE 

OTHER  FURNITURE  AND 

PAPER  AND.. .EXCEPT... 

PAPERB'D  CONTAINERS 

PRINTING  AND  PU3LISH'G 
CHEMICALS  AND. ..PROD'S 

PLASTICS  AND MATER'S 

DRUGS /...PREPARATIONS 

PAINTS  AND PRODUCTS 

PAVING  MIXTURES  AND... 
ASPHALT  FELTS  AND  COAT 
RUBBER  AND. ..PRODUCTS 
LEATHER  TANNING  AND... 
FOOTWEAR  AND. ..PROD'S 
GLASS  AND  GLASS  PROD'S 
STONE  AND  CLAY  PROD'S 
PRIM.  IRON  AND  STEEL.. 
PRIM.  NONFERROUS  METAL 
METAL  CONTAINERS 

HEAT./  PLUMti PROD'S 

SCREW  MACH.  PROD'S/ 

OTHER  FAO.  METAL  PROD. 
ENGINES  AND  TURH1NES 
FARM  MACHINERY 

CONSTRUCTION/ EQUIP. 

MAT.  HANDLING EQUIP. 


Coal 


Energy  Supplies 
Crude    Oil   Electric 


Gas 


02 

00 

02 

02 

02 

11 

02 

02 

02 

02 

01 

01 

01 

01 

01 

01 

00 

01 

01 

01 

11 

01 

11 

11 

01 

11 

00 

08 

08 

11 

11 

00 

08 

08 

11 

11 

00 

11 

11 

11 

11 

00 

11 

11 

11 

11 

00 

02 

02 

02 

11 

00 

02 

02 

02 

02 

00 

02 

02 

02 

11 

00 

02 

02 

02 

03 

00 

11 

11 

11 

03 

00 

11 

11 

11 

03 

00 

05 

03 

03 

02 

00 

04 

02 

02 

02 

00 

04 

02 

02 

02 

00 

04 

02 

02 

03 

00 

05 

11 

03 

03 

00 

05 

03 

03 

11 

00 

11 

11 

11 

03 

00 

05 

03 

03 

11 

uo 

05 

03 

11 

03 

00 

05 

03 

03 

03 

00 

05 

11 

03 

02 

00 

04 

02 

02 

02 

00 

04 

02 

02 

03 

00 

05 

03 

03 

02 

02 

04 

02 

02 

02 

00 

04 

02 

02 

02 

00 

04 

02 

02 

02 

00 

04 

02 

02 

11 

00 

06 

02 

02 

02 

00 

06 

02 

02 

02 

00 

04 

02 

02 

03 

00 

05 

11 

11 

05 

00 

05 

03 

03 

02 

00 

04 

02 

02 

02 

00 

04 

02 

02 

02 

00 

04 

02  . 

02 

02 

00 

04 

02 

02 

02 

00 

06 

02 

02 

03 

00 

05 

03 

03 

02 

00 

04 

02 

02 

03 

00 

07 

03 

03 

02  . 

00 

04 

02 

02 

02 

00 

04 

02 

02 

03 

00 

05 

03 

03 

03 

00 

05 

11 

03 
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TABLE  A. 2-2  (continued) 


Sector 
Number 

51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 


Sector  Name 


Energy  Supplies 


METALS 

SPEC. 

GEN.  I 

M  A  C  M  J  N 

OFF., 

SERV. 

ELEC. 

HOUSt  H 

ELEC. 

RADIO, 

ELEC. 

MISC. 

MOTOR 

A1RCRA 

OTHER 

PROFES 

OPICAL 

MISC. 

RA1LRO 

...HIG 

MOTOR 

WATER 

AIR  TR 

PIPE  L 

TRANSP 

COM'. MS 

RADIO 

WATER 

WHOLES 

FINANC 

REAL  E 

HOTELS 

BUSINE 

AUTO. 

AMUSEM 

MED./ 

FED.  G 

STATE 

BUS.  T 

OFFICE 

PERS. 

GROSS. 

NET  IN 

NET  EX 

FED.  G 

FED.  G 

STATE. 

STATE. 

STATF. 

STATE. 


0RK1N 

INDUS 

ND'JST 

E  ShO 

C'OMP' 

1ND. 

TRANS 

OLD  A 

LI'jHT 

TV, 
CO*!PO 
ELEC. 
VEHIC 
FT  AN 
TRANS 
SIONA 
.  ..EQ 
M  A  N  U  F 
ADS  A 
HwAY 
FREIG 
TRANS 
ANSPO 
INE  T 
ORTAI 

EXCE 
AND  T 
AND  S 
ALE  A 
E  AND 
STATE 

AND. 
SS  SE 
REPAI 
ENTS 
ED.  S 
OV'T 
AND  L 
RAV.. 

SUPP 
CONSU 
..CAP 
V  E  N  T  0 
PORTS 
OV'T. 
OV'T. 
.  .GOV 
.  .HFA 
.  .GOV 
.  .GOV 


'j  .  .  .  E 
TRY.. 
RIAL. 
P  PRO 

G M 

M  A  C  H  1 

...AP 

PPLIA 

•G... 

COM. 

NENTS 

..SUP 

HLES. 

D  PAR 

.  E9IJ 

L...S 

UIP. 

ACTUR 

ND 

PASS  . 
HT  TR 
PORTA 
RTATI 
RANSP 
ON  SE 
PT  RA 
V  3R0 
AN.  S 
ND  RE 
INSU 
AND 
..EXC 
"VICE 
R  AND 


QUIP'T 

.EQUIP 

.  .EQ'T 

DUCTS 

mCHINE 

NES 

PARAT. 

NCES 

EQUIP. 

EQUIP. 

PLIES 

..EQ'T 

TS 

IP*1ENT 

UPPLIE 

AND.  .. 

1NG 

SERV'S 

TRAN. 
ANS.  .. 
T10N 
ON 

ORTA'N 
RVICES 
DID... 
ADCAST 
ERV  'S 
TAIL.. 
RANCE 
RENTAL 
EPT... 
S 

SERV. 


ERV'S  AND.. 

ENTERPRISES 

OCAL...EN'S 

.AND  GIFTS 

LIES 

MP'N  EXPEN. 

.  FORMATION 

RY  CHANGE 

..DEFENSE 
..OTHER 
'T...EDUC  'N 

LTH, SAN. 

'T..  .SAFETY 
•T OTHER 


sal 

Crude 

Oil 

Electric 

Gas 

U2 

00 

04 

02 

02 

03 

00 

05 

11 

03 

02 

00 

04 

02 

02 

03 

00 

05 

03 

03 

11 

00 

04 

02 

02 

03 

00 

05 

11 

03 

03 

00 

05 

03 

03 

03 

00 

05 

11 

03 

03 

00 

05 

03 

03 

03 

00 

05 

03 

03 

02 

00 

04 

02 

02 

03 

00 

05 

11 

11 

02 

00 

04 

02 

02 

02 

00 

04 

02 

02 

0? 

00 

04 

02 

02 

03 

00 

05 

11 

11 

03 

00 

04 

02 

02 

03 

00 

05 

03 

11 

09 

00 

09 

09 

11 

11 

00 

09 

09 

11 

11 

00 

09 

11 

11 

0? 

00 

09 

11 

11 

11 

00 

09 

11 

11 

11 

00 

11 

11 

09 

11 

00 

00 

11 

11 

11 

00 

11 

11 

11 

11 

00 

11 

11 

11 

11 

00 

11 

11 

11 

11 

00 

11 

11 

11 

11 

00 

11 

12 

11 

no 

00 

11 

11 

11 

11 

00 

11 

11 

51 

11 

00 

11 

12 

11 

n 

00 

11 

12 

11 

11 

00 

11 

12 

11 

11 

00 

11 

11 

11 

11 

00 

11 

12 

11 

11 

00 

11 

12 

11 

00 

00 

00 

00 

00 

03 

00 

00 

00 

00 

11 

00 

01 

12 

01 

CO 

00 

00 

00 

00 

01 

01 

01 

00 

01 

01 

01 

01 

01 

01 

11 

00 

11 

11 

11 

11 

00 

11 

11 

11 

11 

00 

11 

11 

11 

11 

00 

11 

12 

11 

11  . 

00 

11 

12 

11 

11 

00 

11 

12 

11 
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TABLE  A. 2-3 


TOLERANCE  CODES   FOR   ENERGY  END  USES  DATA    (101  SECTOR) 


Misc. 

Misc. 
Elec. 

Sector 

i 

Feed- 

Mot. 

Ther. 

Water 

Space 

Air 

Pow. 

Number 

Sector  Name 

Coke 

Stocks 

Pow. 

Users 

Heat 

Heat 

Cond 

•  Uses 

1 
2 

COAL    MINING 

00 

14 

14 

16 

15 

00 

00 

18 

CRUDE     PETROLEUM/-    GAS 

00 

14 

14 

16 

15 

00 

00 

18 

3 

GASIFIED    COAL 

00 

21 

'21 

21 

21 

21 

21 

21 

k 

RtF'D    PETROLEUM    PROD'S 

00 

13 

14 

19 

15 

14 

1  7 

20 

5 

NATURAL    GAS    UTILITIES 

00 

14 

14 

00 

14 

22 

14 

23 

6 

FOSSIL    ELECTRIC    UTIL'S 

00 

14 

14 

30 

14 

22 

14 

23 

7 

NUCLEAR    ELEC.     UTIL'S 

00 

14 

14 

00 

14 

22 

14 

23 

8 

RENEWA3LE     ELEC.     UTIL'S 

00 

14 

14 

00 

14 

22 

14 

23 

9 

ORt-REDUC.     FEEDSTOCKS 

00 

00 

00 

DO 

00 

00 

00 

00 

10 

CHEMICAL    FEEDSTOCKS 

00 

00 

DO 

30 

00 

00 

00 

00 

11 

MOTIVE     POWER 

00 

00 

00 

00 

00 

00 

00 

00 

12 

MISC.     THERMAL     USES 

00 

00 

00 

30 

DO 

00 

UO 

00 

13 

WATER    HEAT 

00 

00 

00 

30 

30 

00 

00 

00 

Ik 

SPACE    HEAT 

00 

00 

00 

30 

00 

00 

00 

00 

15 

AIR-CONDITIONING 

00 

00 

00 

00 

00 

00 

00 

00 

16 

MISC.     ELEC.     POWER    USES 

00 

00 

00 

00 

DO 

00 

00 

00 

17 

LIVESTOCK PRODUCTS 

CO 

u 

14 

00 

14 

22 

00 

23 

18 

OTHER    AGRIC'L    PRODUCTS 

00 

13 

13 

30 

14 

22 

00 

23 

19 

FORTSTRY    AND     FISHERY.. 

00 

14 

14 

00 

14 

22 

00 

23 

20 

AG.*-    FOR*Y. -.SERVICES 

00 

14 

14 

00 

14 

22 

00 

23 

21 

IRON. ..ORES     MINING 

00 

n 

14 

16 

15 

00 

00 

18 

22 

NONFfcRROUS    ORES    MINING 

00 

14 

14 

16 

15 

00 

00 

18 

23 

STONE     AND     CLAY     MINING. 

00 

14 

14 

16 

15 

00 

00 

18 

2U 

CHEMICALS,    ETC.    MINING 

00 

14 

14 

16 

15 

00 

00 

18 

25 

NEW    CONSTRUCTION 

00 

24 

14 

25 

25 

25 

00 

23 

26 

MAINT.     AND    REPAIR     CON. 

00 

24 

14 

25 

25 

00 

00 

23 

27 

ORDNANCE    AND     ACCESSOR. 

00 

14 

14 

29 

15 

14 

17 

30 

28 

FOOD    AND     KINDRED    PROD. 

13 

14 

14 

19 

15 

14 

17 

20 

29 

TOBACCO    MANUFACTURING 

00 

14 

14 

19 

15 

14 

17 

20 

30 

FA3RIC THREAD    MILLS 

00 

14 

14 

19 

15 

14 

17 

20 

31 

MISC.    TEXTILE..  .FLOOR. 

00 

14 

14 

29 

15 

14 

17 

30 

32 

APPAREL 

00 

14 

14 

29 

15 

14 

17 

30 

33 

MISC.     FAB.    TEXTILE    PRO 

00 

14 

14 

29 

15   . 

14 

17 

30 

3U 

LUMBER. ..PROD'S,    EXCEP 

00 

14 

14 

29 

15 

14 

17 

30 

35 

WOODEN    CONTAINERS 

00 

14 

14 

29 

15 

14 

00 

30 

36 

HOUSEHOLD     FURNITURE 

00 

14 

14 

29 

15 

14 

00 

30 

37 

OTHER     FURNITURE     AND 

00 

14 

14 

29 

15 

14 

00 

30 

38 

PAPER     AND..  .EXCEPT 

00 

13 

14 

19 

15 

14 

17 

20 

39 

PAPERB'D    CONTAINERS... 

00 

14 

14 

19 

15 

14 

17 

20 

UO 
Ul 
U2 
»»3 
kk 
h5 
k6 
hi 

PRINTING    AND    PUBLISH'G 

00 

14 

14 

29 

15 

14 

17 

30 

CHEMICALS     AND PROD'S 

00 

13 

14 

19 

15 

14 

17 

20 

PLASTICS     AND.  . .MATER'S 

00 

13 

14 

19 

15 

14 

17 

20 

DRUGS, PREPARATIONS 

00 

14 

14 

19 

15 

14 

17 

20 

PAINTS     AND PRODUCTS 

00 

13 

14 

19 

15 

14 

17 

20 

PAVING     MIXTURES     AND... 

00 

13 

14 

19 

15 

14 

00 

20 

ASPHALT     FELTS     AND     COAT 

00 

13 

14 

19 

15 

14 

00 

20 

RUBBER     AND PRODUCTS 

00 

14 

14 

19 

15 

14 

17 

20 

J*8 

LEATHER    TANNING    AND 

00 

14 

14 

29 

15 

14 

00 

30 

k9 

FOOTWEAR     AND PROD'S 

00 

14 

14 

29 

15 

14 

00 

30 

50 

GLASS     AND     GLASS     PROD'S 

00 

14 

14 

19 

15 

14 

17 

20 

51 

STONE     AND     CLAY     PROD'S 

13 

14 

14 

19 

15 

14 

17 

20 

52 

PRIM.     IRON     AND     STEEL.. 

13 

14 

14 

19 

15 

14 

17 

20 

53 

PRIM.     NONFERROUS     METAL 

13 

14 

14 

19 

15 

14 

17 

20 

55 


TABLE  A. 2- 3  (continued) 


Misc 

Misc. 

Elec 

Sector 

Feed- 

Mot. 

Ther. 

Hater 

Space 

Air- 

Pow. 

Number 

Sector  Name 

Coke 

Stocks 

Pow. 

Uses. 

Heat 

Heat 

Cond. 

Uses 

5k 

METAL  CONTAINERS 

00 

14 

14 

19 

15 

14 

OC 

20 

55 

HEAT./  PLUMB. . .PROD'S 

13 

14 

14 

29 

15 

14 

17 

30 

56 

SCREW  MACH.  PROD  'S/ . . . 

CO 

14 

14 

19 

15 

14 

17 

20 

57 

OTHER  FAB.  METAL  TROD. 

13 

14 

14 

29 

15 

14 

17 

30 

58 

ENGINES  AND  TURBINES 

13 

14 

14 

19 

15 

14 

17 

20 

59 

FARM  MACHINERY 

13 

14 

14 

19 

15 

14 

1  7 

20 

60 

CONSTRUCTION, ...EQUIP. 

13 

14 

14 

29 

15 

14 

17 

30 

61 

MAT.  HANDLING E0U1P. 

00 

14 

14 

29 

15 

14 

17 

30 

62 

METAL  WORKING EQUIP' T 

00 

14 

14 

19 

15 

14 

17 

20 

63 

SPEC.  INDUSTRY EQUIP 

13 

14 

14 

29 

15 

14 

1  7 

30 

6k 

GEN.  INDUSTRIAL. ..EQ'T 

13 

14 

14 

19 

15 

14 

17 

20 

65 

MACHINE  SHOP  PRODUCTS 

13 

14 

14 

29 

15 

14 

17 

30 

66 

OFF./  C  0  *!  P  '  G MACHINE 

00 

14 

14 

30 

15 

14 

17 

20 

67 

SERV.  1ND.  MACHINES 

00 

14 

14 

29 

15 

14 

17 

30 

68 

ELEC.  TRANS APPARAT. 

00 

14 

14 

29 

15 

14 

17 

30 

69 

HOUSEHOLD  APPLIANCES 

00 

14 

14 

29 

.  15 

14 

17 

30 

70 

ELEC.  LIGHT  •  C  .  .  .EQUIP. 

13 

14 

14 

29 

15 

14 

17 

30 

71 

RADIO/  TV/  COM.  E3UIP. 

00 

14 

14 

29 

15 

14 

17 

30 

72 

ELEC.  COMPONENTS. .. 

00 

14 

14 

19 

15 

14 

17 

20 

73 

H1SC.  ELEC.  ..SUP-LIES 

00 

14 

14 

29 

15 

14 

17 

30 

7k 

MOTOR  VEHICHLES CQ'T 

13 

14 

14 

19 

15 

14 

17 

20 

75 

AIRCRAFT  AND  »  A  fi  T  S 

00 

14 

14 

19 

15 

14 

17 

20 

76 

OTHER  TRANS.  EQUIPMENT 

00 

14 

14 

19 

15 

14 

17 

20 

77 

PROFESSIONAL.'.  .SUP  PL  IE 

00 

14 

14 

29 

15 

14 

17 

30 

78 

0P1CAL EQUIP.  AND 

00 

14 

14 

19 

15 

14 

17 

20 

79 

MISC.  MANUF ACTUPINu 

00 

14 

14 

29 

15 

14 

17 

30 

80 

RAILROADS  AND  ...  SERV ' S 

00 

13 

13 

30 

15 

22 

00 

18 

81 

...HIGHWAY  PASS.  T  R  A  N  . 

00 

13 

13 

30 

15 

22 

00 

1b 

82 

MOTOR  FREIGHT  TRANS 

00 

13 

13 

30 

15 

22 

00 

18 

83 

WATER  TRANSPORTATION 

00 

13 

13 

30 

15 

22 

00 

18 

81* 

AIR  TRANSPORTATION 

00 

13 

13 

30 

15 

21 

00 

18 

85 

PIPE  LINE  TOANSPORTA'N 

00 

22 

22 

00 

15 

22 

00 

18 

86 

TRANSP0RTA10N  SERVICES 

00 

90 

00 

30 

15 

22 

00 

18 

87 

COHVNS  EXCEPT  RADIO... 

00 

?7 

27 

30 

14 

22 

14 

23 

88 

RADIO  AND  TV  BROADCAST 

00 

27 

27 

00 

14 

22 

14 

23 

89 

WATER  AND  SAN.  SERV'S 

00 

27 

27 

30 

14 

22 

14 

23 

90 

-WHOLESALE  AND  RETAIL.. 

00 

27 

27 

26 

26 

22 

14 

23 

91 

FINANCE  AND  INSURANCE 

00 

27 

27 

30 

14 

22 

14 

23 

92 

REAL  ESTATE  AND  RENTAL 

00 

27 

27 

30 

14 

22 

14 

23 

93 

HOTELS  AND EXCEPT... 

00 

27 

27 

26 

52 

52 

14 

23 

9^ 

BUSINESS  SERVICES 

00 

27 

27 

30 

14 

22 

14 

23 

95 

AUTO.  REPAIR  AND  SERV. 

00 

27 

27 

30 

14 

22 

14 

23 

96 

AMUSEMENTS 

00 

27 

27 

26 

26 

22 

14 

23 

97 

MED./  ED.  SERV'S  AND.. 

00 

27 

27 

?6 

26 

22 

14 

23 

98 

FED.  GOV'T  ENTERPRISES 

00 

27 

27 

30 

14 

22 

14 

23 

99 

STATE  AND  LOCAL EN'S 

00 

27 

27 

30 

14 

22 

14 

23 

100 

BUS.  TRAV AND  GIFTS 

00 

00 

00 

00 

00 

00 

00 

00 

101 

OFFICE  SUPPLIES 

00 

00 

00 

30 

00 

00 

00 

00 

102 

PERS.  CONSUMP'N  EXPEN. 

00 

28 

28 

28 

28 

28 

28 

28 

103 

bROSS...CAP.  FORMATION 

00 

00 

00 

30 

00 

00 

00 

00 

10U 

NET  INVENTORY  CHANGE 

00 

00 

00 

00 

00 

00 

00 

00 

105 
106 

NET  EXPORTS 

00 

00 

00 

00 

00 

00 

00 

00 

FED.  GOV'T DEFENSE 

00 

13 

13 

26 

26 

22 

14 

23 

107 
108 

FED.  GOV'T OTHER 

00 

27 

27 

30 

14 

22 

14 

23 

STATE..  .GOV'T.  .  .EDUC'N 

00 

27 

27 

26 

26 

22 

14 

23 

109 

STATE HEALTH/ SAN. 

00 

27 

27 

26 

26 

22 

14 

23 

110 

STATE GOV'T SAFETY 

00 

27 

27 

00 

14 

22 

14 

23 

111 

STATE. ..GOV'T. . .OTHER 

00 

27 

27 

30 

14 

22 

14 

23 
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A.  3   DISPERSION  FACTORS  FOR  SMALL-MAGNITUDE  FIGURES 

Often  the  uncertainty  on  a  column  of  figures,  X-  would  be  described 
in  terms  of  "a  dispersion  factor  D.  for  x.  =  B  increasing  to  a  dis- 
persion factor  D  for  the  smallest  value  reported."  Taking  this  lower 
bound  to  be  X-  =  A  where  A  =  $10   for  the  196?  U.S.  input-output 
tables,  and  assuming  a  linear  dependence  of  D(x)  on  log(x)   we  obtain 
the  following  expression  for  D  as  a  function  of  x-«   Let  D(x)  =  a  log(x) 
+  b  where  a  =  (Dg  -  D  )/(log  A  -  log  3)  and  b  = (D1   log  A  -  Dg  log  B)/ 
(log  A  -  log  B).   It  is  easy  to  verify  that   D(x)   takes  values  Dx  and  Z2 
at  x  =  B  and  x  =  A  respectively. 

Obviously  this  is  a  crude  approximation,  but  it  actually  may  be 
even  too  refined  when  viewed  from  the  perspective  of  the  person  estima- 
ting the  uncertainty. 
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A.U   PROBABLE  VALUES  OF  "ZERO"  ELEMENTS 

Since  few  transactions  can  be  defined  to  be  zero,  the  published 
figures  truncated  at  $10  dollars  may  be  misleading.   It  is  probably- 
true  that  if  we  examined  in  detail  the  transactions  of  all  firms  in  the 
U.S.  defined  by  a  particular  transaction  cell  in  the  1-0  table,  we  would 
find  at  least  one  nonzero  transaction.  Therefore  the  following  approxi- 
mation was  used  to  estimate  the  probable  distribution  of  nonzero  values 
between  the  lower  and  upper  bounds  [0,  $10  ]. 

Let  X  be  the  absolute  value  of  a  normal  random  variable  Y  with 
mean  0  and  a   =  10/  .    Then  X  takes  nearly  all  its  values  between  0 
and  10  .   By  truncating  X  at  10  ,  in  the  sense  that  larger  values  are 
discarded  and  resampled,  the  resulting  random  variable  takes  all  of 
its  values  in  [0,  10  ]  with  the  great  bulk  of  its  unit  probability  ac- 
cumulated near  zero. 

For  direct  energy  transactions,  the  cutoff  was  10  ^Btu,  which 
corresponds  to  approximately  the  same  dollar  value. 

Details  of  the  "folded  normal"  distribution  are  given  in  Appendix  C. 
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Appendix  B.   Sector  Definitions 

TABLE  B-l.    30  Sector  Model 

30  SECTOR 

MODEL  BEA  SECTOR 

1.  7 Coal 

2.  8 Crude  Oil  and  Natural  Gas 

3.  31.01     Refined  Petroleum  Products 

k.  68.01 Electric  Utilities 

5-  68.02     Natural   Gas  Utilities 

6.  1-k Agriculture 

7.  5-10 Mining 

8.  11-12  Construction 

9.  Ik,   15,  29 Food  and  Drugs 

10.  16-19   Textiles  and  Apparel 

11.  20-26 Wood  and  Paper  Products 

12.  27,  28,  30-32 Paint,  Plastics  and  Oil  Products 

13.  33,  3^+ Leather  and  Shoes 

lk.  35,  36 Stone,  Clay  and  Glass  Products 

15.  37-^2  Metals  and  Metal  Products 

16.  1+3-52 Machinery 

17.  53-58  Electrical  Equipment  and  Appliances 

18.  59-61  Cars,  Planes  and  Transport  Equipment 

19.  62-6U,  13  Miscellaneous  Manufacturing 

20.  65.OI Rail  Transport 

21.  65.02   Local  Passenger  Transport 

22.  65.03   Truck  Transport  and  Warehousing 

23.  65. 0k      Water  Transport 

2k.  65.05 Air  Transport 

25.  65.06  Pipeline  Transport 

26.  66-67   Radio,  TV,  Communications 

27.  69 Wholesale  and  Retail  Trade 

28.  70-71   Finance,  Insurance  and  Real  Estate 

29.  65.07,  68.03,  72-79   •  •  •  Services 

30.  81-82   Business  Travel  and  Office  Supplies 
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Appendix  C.   Treatment  of  Zero  Values 

In  section  1  the  variance  of  a  "folded"  normal  is  computed  while 
section  2  details  the  relationship  between  the  variance  of  a  lognormal 
and  its  dispersion  factor. 

C.l  Variance  of  a  "Folded"  Normal  Distribution 

The  variance  of  any  random  variable,  X,  is  given  by  Var(X)  = 
E(X2)  -  (E(X))2,  where  E  denotes  expected  value.   If  Y  is  N(0,l)  then 
1  =  Var  (Y)  =  E(Y2)  -  02  so  E(Y2)  =1.   Z  is  said  to  be  "folded"  N(0,l), 
if  Z  *  ABS(Y).   Then  E(Z2)  =  E(Y2)  =  1.   Therefore  Var  (Z)  =  E(Z2)  -  (E(z))2. 


But 


2  2 

-t  -t  I  °° 

E(Z)  .  -i-     te"2"  d   =  -§rl-?    u-2- 

;r2n   .  2ii  j    ;  o    2n 


therefore  Var  (Z)  =  1  -  (E(z)  )2  =  1  -  -| 

If  we  fold  a  normal  random  variable  with  three  sigmas  equal  to  b  then  the 
variance  computed  above  is  multiplied  by  a  factor  of  (— )  .   Conversely 

if  we  know  the  variance  V  of  a  folded  normal  then  one  sigma  of  the  underlying 
normal  is  \   V 

-I 


C.2  Variance  of  a  Lognormal  Distribution 

If  a  given  cell  is  lognormal  with  published  mean  M  and  three  sigma 

2 

dispersion  factor  D,  then  we  sample  for  this  cell  by  exponentiating  a  N(a,3  ) 

2 
.  ,.,    ,     „   InD    ,     ,  „.   In  D 
random  variable  where  8  =  -rr—  and  a  =  InM  -    „  . 
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The  variance  of  a  lognormal  random  variable  with  parameters  a   and  3  is 

given  by  V  =  e      (e   -l)  =  M  (e   -l).   Substituting  — r—  for  $  we  obtain 

2 

V  =  M  *  exp  — - —  -1  .   Conversely,  we  can  solve  this  equation  for  D: 


D  =  exp  (3*  lnd+V/M2)' 


Incidentally,  at  the  changeover  point  from  normal  to  lognormal  we  have 

'  "  \ 
ABS  -rr/r  .h   or  —?  =   .0178.   At  this  point  D  =  I.U89.   As  we  cross  the  changeover 

point  from  normal  to  lognormal  we  switch  from  a  normal  with  range  between 
60%   and  ll+0%  of  the  published  mean  value  to  a  lognormal  with  D  =  .  IU89  and 
1/D  =  .67. 
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APPENDIX  D.   RANDOM  NUMBER  GENERATOR  VERIFICATION 
By  Robert  Bohrer  and  Dan  Putnam 

D.l   INTRODUCTION 

The  present  Monte  Carlo  study  of  the  sensitivity  of  Input-Output  data 
to  stochastic  estimation  errors  requires  the  use  of  a  fast,  reliable  nor- 
mal random  number  generator.   In  the  present  case  each  Monte  Carlo  trial 
requires  roughly  ten  thousand  random  numbers  to  perturb  the  parameters 
in  the  Input-Output  model.   Since  several  hundred  trials  are  required  to 
obtain  useful  results,  speed  is  an  important  consideration  in  the  choice 
of  a  generator  for  this  application.   Furthermore,  a  necessary  condition 
for  the  validity  of  the  results  is,  of  course,  that  the  random  inputs 
conform  to  the  modeling  assumptions.   In  this  case  the  required  inputs 
are  independent,  normally  distributed  random  numbers. 

The  generators  in  the  International  Mathematical  and  Statistical  Li- 
lt 
braries   were  given  special  consideration  for  use  in  the  Monte  Carlo  study 

because  of  the  good  reputation  of  IMSL  and  the  availability  of  IMSL  at 
most  large  IBM  installations.   One  generator,  GGNRF,  was  especially  ap- 
pealing since  it  was  designed  specifically  as  a  fast  normal  random  num- 
ber generator  and  had  already  been  optimized  and  coded  in  assembler 
language  for  efficiency.   The  one  flaw  in  the  qualifications  of  GGNRF 
was  that  the  existing  documentation  of  its  statistical  properties  was 
inadequate  for  our  needs.   Although  the  distributional  properties  had 
been  checked  with  several  goodness  of  fit  tests,  independence  had  been 


Available  from  International  Mathematical  and  Statistical  Libraries 
(IMSL)  Inc.,  Sixth  Floor,  GNB  Building,  7500  Bellaire  Boulevard,  Houston, 
Texas  77036. 


checked  only  up  to  five  lags.  Documentation  of  these  tests  may  be  found 
in  Kuki  (197*0.  While  it  is  impossible  to  test  all  aspects  of  randomness, 
it  is  prudent  to  examine  at  least  those  aspects  most  important  to  the 
particular  application.   The  present  Monte  Carlo  study  requires  inde- 
pendence among  the  inputs  to  each  Monte  Carlo  trial  and  among  the 
separate  trials  as  well.   The  tests  described  in  this  paper  were  designed 
by  the  first  author  to  examine  both  types  of  independence.  Tests  of  nor- 
mality were  also  included  for  the  sake  of  completeness. 

These  properties  may  be  tested  by  selecting  several  seeds  for  the 
generator  and  examining  three  sequences  obtained  from  each.  The  first 
two  sequences  are  chosen  so  that  in  the  actual  simulation  they  would  oc- 
cupy corresponding  positions  in  the  inputs  to  consecutive  simulation 
runs.  Thus,  the  first  N  numbers  drawn  from  a  seed  constitute  the  first 
sequence.  Then,  as  many  more  numbers  are  generated  as  would  be  needed 
to  complete  one  simulation  run.  The  second  sequence  then  consists  of 
the  next  N  numbers  drawn.  To  examine  the  independence  between  these 
first  two  sequences,  a  third  sequence  of  2N  numbers  is  formed  by  shuff- 
ling the  other  two  sequences  so  that  the  odd  numbered  entries  are  taken 
in  order  from  the  first  sequence  and  the  even  numbered  entries  are  taken 
in  order  from  the  second  sequence. 

It  is  desirable  to  use  tests  which  are  sensitive  to  as  many  depar- 
tures from  the  hypotheses  as  possible.   The  methods  used  here  allow 
tests  of  the  independence  .of  X(t)  and  X(t+L)  for  all  lags  L  less  than 


* 


For  a  discussion  of  random  number  generation  and  testing  see  Jansson  (1966) 
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or  equal  to  the  sample  size  N.   For  this  reason  and  because  the  statis- 
tical power  of  these  tests  increases  with  sample  size,  the  value  of  N 
used  in  defining  the  sequences  above  should  be  as  large  as  possible. 
While  it  would  be  desirable  to  make  N  as  large  as  the  number  of  inputs 
to  a  Monte  Carlo  trial,  this  possibility  was  precluded  by  considerations 
of  the  availability  of  computing  resources  in  the  statistical  analyses 
of  the  random  number  samples.   In  this  study  N  =  102U  was  chosen  with 
these  considerations  in  mind  and  because  of  the  efficiency  of  working 
with  a  power  of  2  in  Fast  Fourier  Transformation.   Five  seeds  were 
selected  for  the  tests  that  follow,  again  with  this  number  being  de- 
termined partly  by  considerations  of  available  computing  resources 
versus  the  quantity  of  information  generated. 

Section  2  details  the  tests  performed  on  the  shuffled  sequences 
obtained  from  the  five  seeds  to  check  independence  between  Monte  Carlo 
trials.   Section  3  describes  the  tests  on  the  two  unshuffled  sequences 
from  each  seed  to  examine  the  independence  of  the  inputs  to  a  single 
trial.   Finally,  section  h   documents  the  test  performed  on  the  un- 
shuffled sequences  to  check  that  the  samples  are  normally  distributed 
with  mean  zero  and  variance  one. 
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D.2   INDEPENDENCE  BETWEEN  MONTE  CARLO  SAMPLES 

To  test  the  independence  of  consecutive  runs  from  the  Monte  Carlo 
experiment,  the  shuffled  sequences  from  the  five  seeds  were  examined. 
Given  the  hypothesis  of  independence  between  simulation  runs,  the  sample 
autocovariances  at  odd  lags,  L,  of  the  shuffled  sequences  should  be 
independent  and  standard  normal  (i.e.,  N(o,l))  when  multiplied  by 


1/20U8-L.   A  test  of  the  independence  of  consecutive  Monte  Carlo  trials 

may  then  be  made  by  using  the  Kolmogrov-Srairnov  (K-S)  statistic  to 

compare  the  sample  distribution  function  of  the  adjusted  autocovariances 

at  odd  lags  to  the  standard  normal  distribution  function.   The  sample 

used  for  this  test  was  therefore  s1***s256»  Si  =  AC2i-l  *  v/20U8-(2i-l) 

where  the  AC~.  ,  are  the  odd  lag  autocovariances.   Table  1  lists  the  five 
2i-l 

K-S  statistics  and  P-levels  obtained  from  the  five  shuffled  sequences 

in  this  way.  The  tabled  statistics  represent  the  square  root  of  the  sample 


That  the  mean  is  0  and  the  variance  is  1/20U8-L  follows  from  simple 
calculation  with  expectations.   The  asymptotic  normality  and  independence 
follow  from  careful  application  of  a  multivariate  central  limit  theorem 
such  as  theorem  9.2.3  in  Wilks  (1962).   For  example,  normality  of  the 
lag  L  covariance  estimate  is  shown  by  applying  the  theorem  to  the  two 
sequences 

(X1*W  •••XL*X2L'  X2L+1*X3L+1>  •") 

and    (Xl+^sl+I*  •"X2L*X3L'  X3L+1*XUL+1 '  •**)' 

If  the  covariance  estimates  are  adjusted  for  the  sample  mean,  then  the 
tests  for  the  hypothesized  means  and  covariances  can  be  made  separately, 
each  having  valid  significance  level,  even  under  certain  natural  dis- 
crepancies from  the  other  hypothesis.   Also,  a  theorem  in  section  20.6 
of  Cramer  (19U6)  shows  that  the  large  sample  distribution  theory  is 
exactly  the  same  as  described  above. 
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size  times  the  maximum  absolute  difference  between  sample  and  hypothesized 
distribution  functions.   The  P-level  is  defined  as  the  probability  that  a 
truly  normal  sample  would  have  achieved  a  larger  value  for  the  K-S  statis- 
tic than  the  value  actually  observed.   Given  the  independence  hypothesis, 
the  P-levels  should  be  independent  and  uniformly  distributed  on  the  unit 
interval.   This  fact  allows  the  use  of  the  following  summary  statistic: 

if  P..«'«P  are  independent  and  uniform  on  the  unit  interval,  then 

n  * 

-2*£  ln(P.)  is  chi  square  with  2n  degrees  of  freedom.   This  sample 

1     X 
statistic  and  its  own  P-level  are  included  in  Table  1  along  with  the 

K-S  statistics  and  their  P-levels  as  an  additional  check  and  summary. 


TABLE  1 


Statistic 

P-level 

1 

.Ihk 

.6U 

2 

.869 

M 

3 

.5^8 

.92 

k 

.807 

.53 

5 

.772 

.59 

-2*Z  ln(P   ) 

5.03 

.89 

This  derives  from  the  fact  that  if  P  is  uniform  on  [0,1],  then  -2*ln(P) 
is  exponentially  distributed  with  distribution  function  l-e'X'2^ 
The  result  now  follows  by  verifying  that  the  associated  density  function 
is  that  of  a  chi-square  random  variable  with  two  degrees  of  freedom. 
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A  plot  of  the  sample  distribution  function  with  the  worst  fit  (from 
seed  #2  in  this  case)  is  shown  in  Figure  1.   Even  in  this  worst  case 
note  how  closely  the  sample  distribution  function  fits  the  hypothesized 
distribution. 

The  K-S  tests  described  above  provide  a  check  on  independence  in 
the  time  domain;  a  check  in  the  frequency  domain  was  also  performed.  By 
summing  the  Fast  Fourier  Transform  of  each  of  the  shuffled  sequences, 
sample  integrated  periodograms  were  obtained.   Under  the  independence 
hypothesis,  the  integrated  periodogram  values  should  increase  linearly 
from  zero  to  one  as  the  frequencies  increase  from  zero  to  one-half.  The 
Grenander-Rosenblatt  (G-R)  statistic  may  be  used  to  measure  the  discrep- 
ancy of  the  sample  integrated  periodogram  from  linearity.   If  the  hypo- 
thesis is  true  then  a  factor  of  V20U8  /  \J2     =  32  times  the  maximum  ab- 
solute difference  between  the  sample  integrated  periodogram  and  twice  the 
corresponding  frequency  should  have  the  distribution  calculated  in  Hannan 
(1967) ;  departures  from  independence  will  tend  to  make  the  sample  sta- 
tistics too  large  to  fit  the  distribution.  The  five  sample  statistics 
and  corresponding  P-levels  are  shown  in  Table  2  below  along  with  the  chi- 
square  summary  statistic  defined  in  the  last  section.   Seed  #1  had  the 
worst  P-level  so  a  graph  of  the  corresponding  sample  integrated  periodo- 
£ram  is  included  in  Figure  2. 


Computed  with  SOUPAC  program  FASPER  available  from  Computing  Services 
Office,  University  of  Illinois,  Urbana,  Illinois  618OI. 
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TABLE  2 


Statistic 

P-level 

.1 

2.21+3 

.OU 

2 

.725 

.88 

3 

1.578 

.33 

k 

l.OOU 

.63 

5 

.671 

.92 

-2*1   ln(p.) 

1 

10.00 

.Ul» 

The  res\ilts  of  this  series  of  tests,  like  those  of  the  preceding 
section,  are  quite  satisfactory  and  give  no  cause  to  suspect  interdependence 
between  simulation  runs . 
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D.3   INDEPENDENCE  BETWEEN  INPUTS  TO  A  MONTE  CARLO  SAMPLE 

While  the  frequency  domain  tests  of  the  last  section  actually  include 
a  test  of  the  independence  of  sample  inputs,  further  tests  were  performed 
and  are  described  below.  The  next  two  series  of  tests  paralleled  the  two 
series  of  the  last  section  in  methodology,  but  this  time  attention  was 
focused  on  the  two  sequences  of  102U  numbers  generated  from  each  of  the  five 
seeds.   In  the  time  domain,  the  autocovariances  at  lags  L  =  1,512  of  a  sequence 
were  adjusted  by  a  factor  of  /102U-L  and  tested  with  the  Kolmogorov-Smirnov 
statistic.  Again,  under  the  hypothesis  of  independence  within  each  of  the 
ten  sequences  of  102^  numbers,  the  adjusted  autocovariances  of  each  sequence 
should  constitute  a  standard  normal  sample.  The  resulting  sample  statistics 
and  P-levels  along  with  the  chi-square  summary  statistic  and  its  P-level  are 
shown  in  Table  3.    Similarly,  the  ten  sequences  were  tested  for  independence 

in  the  frequency  domain  with  the  Fast  Fourier  Transform  just  as  in  the  previous 
section.   The  sample  statistics  and  summary  statistic  are  shown  in  Table  k. 

Again,  the  worst  cases  are  illustrated  for  the  time  and  frequency  domain 

tests  in  Figures  3  and  k. 


TABLE  3 


Statistic 

P-level 

1A 

IB 
2A 
2B 
3A 
3B 

ua 

kB 
5A 
5B 

1.132 

1.21U 

1.077 

.573 

.882 

1.106 

.7^5 

.657 

.61*5 

.U91 

.15 
.11 
.20 
.90 
.U2 

.17 
.61* 
.78 
.80 
.97 

-2*1   ln(P.) 
l 

18.81 

.53 
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TABLE  h 


' 

Statistic 

P-level 

1A 

.659 

.93 

IB 

2.259 

.05 

2A 

2.1+39 

.03 

2B 

l.U6l 

.29 

3A 

.602 

.96 

3B 

1.52U 

.25 

UA 

1.U23 

.31 

ub 

.910 

.71 

5A 

.866 

.75 

5B 

.833 

.79 

-2*Z  in (P.) 

22.55 

.31 

These  results,  like  those  of  the  preceding  section  are  quite 
acceptable.   Overall  then,  the  independence  properties  of  GGNRF  are 
very  satisfactory. 
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D.U  TESTING  FOR  NORMALITY 

A  final  series  of  tests  was  undertaken  to  examine  the  suitability 
of  GGNRF  in  an  application  calling  for  normally  distributed  random 
numbers.  First,  the  Kolmogorov-Smirnov  goodness  of  fit  test  was  applied 
to  the  ten  sequences  of  102U  numbers  with  the  results  shown  in  Table  5 
Following  the  same  format  as  in  previous  sections,  the  sample  statistics 
and  their  P-levels  are  given  along  with  the  summary  chi-square  statistic, 
The  plot  of  the  sample  distribution  function  with  the  worst  fit  is  shown 
in  Figure  5. 


TABLE  5 


Statistic 

P-level 

1A 

1.210 

.10 

IB 

.799 

.55 

2A 

.70U 

.70 

2B 

.578 

.89 

3A 

1.111 

.17 

3B 

1.187 

.12 

kA 

1.095 

.18 

UB 

.590 

.88 

5A 

1.053 

.22 

5B 

.609 

.85 

-2*1  ln(P.) 
l 

21.57 

.36 
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Although  these  results  indicate  good  fit ,  especially  given  the 
relatively  large  sample  size,  the  sample  variances  and  means  were  also 
checked.  Given  the  standard  normality  hypothesis,  the  sample  variances 
multiplied  by  1021+  should  be  chi-square  distributed  with  1023  degrees 
of  freedom.  The  sample  variances  and  the  corresponding  P-levels  under  this 
hypothesis  are  listed  below  in  Table  6  along  with  the  usual  summary 
statistic. . 


TABLE  6 


Statistic 

P-level 

1A 

.979 

.77 

IB 

.915 

.97^ 

2A 

1.106 

.009 

2B 

.936 

.925 

3A 

.997 

.51 

3B 

1.038 

.19 

HA 

1.033 

.22 

Ub 

1.022 

.30 

5A 

.963 

.79 

5B 

.998 

.50 

-2*1   ln(P.) 

22.11 

.33 
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The  sample  means  should  be  normal  with  mean  0  and  variance  1/1021+ . 
Multiplying  the  sample  means  by  a  factor  of  32  should  result  in  numbers 
drawn  from  a  standard  normal  population.  However,  the  large  size  of 
the  P-levels  in  Table  7  gives  cause  for  suspicion  that  the  hypothesis 
is  not  true.  On  the  other  hand,  the  sample  size  of  1021+  was  originally 
chosen  to  be  large  enough  to  signal  even  acceptably  small  deviations  from 
ideal  behavior;  the  very  worst  sample  mean  was  only  -.062.   Furthermore, 
given  the  amount  of  testing  undertaken  in  this  study,  it  is  to  be  expected 
that  sooner  or  later  some  test  results  will  go  awry.  To  shed  more  light 
on  the  matter,  further  tests  of  the  mean  tendency  of  GGNRF  seemed  appro- 
priate. Since  only  one  seed  would  be  needed  for  the  actual  Monte  Carlo 
application,  seed  #3  was  selected  for  a  more  intensive  examination. 
Eleven  thousand  numbers  were  drawn  from  seed  #3  and  then  discarded  in 
order  to  skip  over  the  two  strings  of  1021+  numbers  previously  tested. 
Then  ten  consecutive  strings  of  102U  numbers  were  drawn  and  their  sample 
means  were  computed.  The  results  are  listed  in  Table  8.   In  addition, 
ten  new  seeds  were  selected  and  samples  of  1021+  numbers  drawn  from  each. 
The  data  for  these  sequences  is  shown  in  Table  9. 

TABLE  T 


Statistic 

P-level 

1A 

-.01+9 

.9* 

IB 

-.019 

.72 

2A 

.0083 

.1+0 

2B 

.0069 

.1+1 

3A 

.031+ 

.11+ 

3B 

-.01+2 

.91 

UA 

-.067 

.98 

UB 

-.021 

.75 

5A 

-.056 

.95 

5B 

-.019 

.72 

-2*1  1   (P   ) 

9.89 

.97 

n     1 

An 


TABLE   8 


1 

Statistic 

P-level 

1 

2 
3 

b 

5 
6 

7 

8 

9 

10 

.035 
.013 
-.029 
-.0035 
-.036 
-.016 
-.005^ 
.025 
.06b 
-.0060 

.13 
.3U 
.82 

.5h 
.87 
•  70 
.57 
.21 
.02 
.58 

-2*1  ln(P.) 

21.925 

,3k 

TABLE  9 


Statistic 

P-level 

1 

.OUl 

.10 

2 

.0053 

.U3 

3 

-.0U 

.90 

1* 

.023 

.23 

5 

-.010 

.63 

6 

-.02U 

.78 

7 

.029 

.17 

8 

.018 

.28 

9 

.0026 

M 

10 

-.0035 

.5b 

-2*Z  ln(P.) 

19.1b 

.U8 
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The  results  of  these  tests  certainly  do  much  to  allay  the  suspicions 
raised  by  the  results  in  Table  7.   Neither  would  it  seem  that  seed  03 
has  run  into  an  area  of  systematically  bad  behavior  (witness  Table  8),  nor 
would  it  seem  that  there  is  an  overall  bias  in  the  generator  (witness  Table 
9).   These  tests  together  with  the  preceding  K-S  tests  and  sample  variance 
tests  indicate  that  GGNRF  has  satisfactory  distributional  properties. 
Overall  then,  GGNRF  tests  out  as  a  satisfactory  normal  random  number 
generator. 
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